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squalene epoxidase. The invention also relates to a process of producing genetically-modified plants, 
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(57) Abstract 

The invention provides DNA that can be introduced into the genomes of plants to produce genetically-modified plants having higher 
levels of squalene than the natural plants. Toe DNA corresponds to squalene epoxidase gene of the same or a related plant, and mav have 
the sequence as shown by SEQ ID NO: 1 , SEQ ID NO: 3, SEQ ID NO:5, SEQ ID NO:9 or SEQ ID NO: 1 1; or a part of such a sequence 
or a sequence having at least 60 % homology with such a sequence. The DNA is introduced into the genome in a way that results in 
down-regulation of an exogenous plant squalene epoxidase gene to suppress the expression of squalene epoxidase. The invention also 
relates to a process of producing geneticaJly-modified plants, plasmids and vectors used in the method, genetically-modified plants and 
seeds thereof, and a method of producing squalene from the modified plants. 
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TITLE: PROCESS OF RAISING SQUALENE LEVELS IN PLANTS 

AND DNA SEQUENCES USED THEREFOR 

TECHNICAL FTFT.n 

5 This invention relates to the production of squalene 

for commercial and industrial uses. More particularly, 
the invention relates to a process by which natural 
squalene levels in plants can be increased, and to 
nucleotide sequences that can be introduced into plants 
10 to cause the desired increase, and plasmids, vectors, 
etc., useful in the process. 

BACKGROTTMD &PT 

There is a US$ 125 million per annum market for 
squalene, a colourless oil used in the cosmetics and 
15 health industries (Kaiya, 1990) . Squalene is currently 
obtained mainly from shark liver, but it also occurs in 
small quantities in vegetable oils. Squalene extracted 
from shark liver is declining in supply (Kaiya 1990) and 
the harvesting of sharks for this purpose is anyway 
20 environmentally unfriendly and is becoming less 

acceptable as environmental concerns increase in society. 

Squalene can be extracted from olive oil, although 
the amounts are not sufficient to supply even the 
cosmetics market (Bondioli et al . 1992; Bondioli et al. 
25 1993) . Squalene could be extracted from other vegetable 
oils, but the levels of the hydrocarbon in the oil are 
too low for this to be economically viable. There are at 
present no Canadian crops used for squalene production. 
It has been suggested that, if the levels of squalene 
30 occurring in oilseeds could be increased, the traditional 
source of squalene could be replaced by oilseed crops, to 
the benefit of both the environment and those countries, 
such as Canada, that grow crops of this kind in 
abundance. Many vegetable oils undergo deodorization by 
35 vacuum distillation as a routine part of refining. Most 
of the squalene in the oil can be recovered in the 
deodorizer distillate which is a by-product of this 
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process (Bondioli et al . , 1993). Typically, squalene is 
concentrated more than one hundred fold in the deodorizer 
distillate relative to the levels in unrefined vegetable 
oils. For commercial viability, vegetable oil deodorizer 

5 distillates should contain at least 5% (w/w) squalene. 
Currently, soybean and canola deodorizer distillates 
contain squalene in the 0.1-3% range (Ramamurthi, S., 
1994) . Consequently, an increase of two- fold or more in 
the squalene content of these oilseeds could result in 

10 commercially viable squalene production from vegetable 
oils . 

It has been shown that in plant cell cultures, 
squalene accumulates in the presence of squalene 
epoxidase inhibitors, e.g. allylamines such as 

15 terbinafine (Yates et al . 1991). Apparently, much of the 
squalene produced in plants is converted to the epoxide 
by squalene epoxidase, and ultimately to plant sterols. 
In fact, all plant and higher life forms contain squalene 
and squalene epoxidase genes, but little squalene 

20 accumulates in the tissues of such life forms because of 
the effects of the expressed squalene epoxidase. 
Therefore, inhibition of the epoxidase gives squalene an 
opportunity to accumulate. However, there are as yet no 
commercial processes based on this concept. 

25 A main problem addressed by the inventors of the 

present invention is therefore to create a plant crop, 
particularly an oilseed crop, which accumulates squalene 
in harvestable tissues, such as seeds, at sufficient 
levels for commercially- viable extraction. 

30 DISCLOSURE OF THE INVENTIQN 

An object of the present invention is to provide new 
sources of squalene that have the potential to be 
exploited on a commercial basis to replace conventional 
commercial sources of squalene. 

35 Another object of the present invention, is to 

generate squalene -producing plants modified to accumulate 
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squalene in the plant tissue (e.g. in seeds) in 
sufficient quantities to make the extraction of squalene 
commercially attractive. 

Another object of the invention is to identify 
5 squalene epoxidase genes in plants, and to partially or 
completely neutralise the expression of such genes. 

Another object of the invention is to produce DNA 
clones, constructs and vectors suitable for modifying the 
genomes of plants to reduce expression of squalene 
10 epoxidase. 

Yet another object of the invention is to provide a 
commercial process for producing squalene from plant 
tissue, especially seeds. 

The inventors of the present invention have 
15 discovered the DNA sequences of the genes encoding 
squalene epoxidase (squalene monooxygenase (2,3- 
epoxidizing) ; EC 1.14.99.7) from the plants Arabidopsis 
thaliana (thale cress) , and Brassica napue (rapeseed, 
canola) , as well as a second gene from Arabidopsis and 

20 one from Ricinus communis (castor;, and using this 
knowledge have developed a process of modifying the 
genomes of such plants to produce genetically-modified 
plants which accumulate squalene at higher than natural 
levels. Moreover, the process may be operated to 

25 increase squalene levels in plants using DNA based on 
squalene epoxidase genes from different but related 
plants. 

According to one aspect of the invention, there is 
provided an isolated and cloned DNA (polynucleotide) 
30 suitable for introduction into a genome of a plant to 
suppress expression of squalene epoxidase by said plant 
below natural levels, wherein the DNA has a sequence 
corresponding at least in part to a squalene epoxidase 
gene of a plant. 

35 The DNA preferably has a sequence corresponding to 

all or part of a specific sequence selected from SEQ ID 
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NO:l, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 9 and SEQ ID 
NO: 10 (as shown in the following Sequence Listing); or 
having at least 60% (more preferably at least 70%) 
homology thereto. 

5 The measure of homology between two DNA 

(polynucleotide) sequences as used in this specification 
is the similarity index given by application of the 
Wilbur -Lipman algorithm of the MEGALIGN® computer 
program (DNASTAR) in aligning and comparing DNA sequences 

10 corresponding to a complete polypeptide coding region 
using the parameters ktuple=3, gap penalty=3 and 
window-20 . 

According to another aspect of the invention, there 
is provided a process of producing genetically-modified 
15 plants having increased levels of squalene in tissues of 
the plants compared to corresponding wild- type plants, 
wherein the plant genome is modified to suppress 
expression of squalene expoxidase by said plant. The 
genome is modified by introducing at least one exogenous 
20 DNA sequence that corresponds, at least in part, to one 
or more endogenous squalene epoxidase genes of the plant. 

The DNA sequence introduced into said plant genome 
has at least 60%, and more preferably at least 70%, 
homology to said one or more of the endogenous squalene 
25 epoxidase genes, and is preferably all or part of a 

sequence selected from SEQ ID NO:l, SEQ ID NO: 3, SEQ ID 
NO: 5, SEQ ID NO: 9 and SEQ ID NO: 10. 

According to yet another aspect of the invention, at 
least in a preferred form, there is provided a process of 
30 producing genetically-modified plants having increased 
levels of squalene in tissues of the plants compared to 
corresponding wild- type plants, wherein the plant genome 
is modified to suppress expression of squalene expoxidase 
by said plant, raising squalene levels of a plant, by 
35 introducing into the genome of the plant a nucleotide 
sequence that reduces or prevents expression of squalene 
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epoxidase. The DNA introduced into the genome includes a 
transcriptional promoter and a sequence that when 
transcribed from the promoter is complementary or 
antisense to all or part of at least one squalene 
5 epoxidase messenger RNA produced by the plant. 

The invention also relates to plasmids and vectors 
used in the processes indicated above, and as disclosed 
later. 

The invention further relates to a genet ically- 
10 modified plant capable of accumulating squalene at levels 
higher than the corresponding wild- type plant, produced 
by a process as indicated above, or a seed of such a 
plant . 

The invention additionally relates to a process of 
15 producing squalene, which involves growing a genetically- 
modified plant as defined above, harvesting the plant or 
seeds of the plant, and extracting squalene from the 
harvested plant or seeds. 
BRIEF DESCRIPTION OP THE ORAWTKmg 
20 Figure 1 shows the alignment of deduced amino acid 

sequences of the clones pDRlli (B. napus 111) [SEQ ID 
NO:4], pDR411 [B. napus 411) [SEQ ID NO:ll] and 129F12T7 
(Arabidopsis) [SEQ ID NO: 2], and of the known squalene 
epoxidase genes of mouse (DNA Database of Japan D42048) 
25 [SEQ ID NO: 6], rat (DNA Database of Japan D37920) [SEQ ID 
NO:7], and baker's yeast (Genbank M64994) [SEQ ID NO:8] ; 
the alignment was done using the MEGALIGN™ program of 
the LASERGENE™ suite of programs (DNASTAR) using a 
multiple alignment gap penalty of 20; and 
30 Figures 2, 3 and 4 are plasmid maps of three vectors 

(PSE1HA, pSE411A and pSE129A, respectively) produced 
according to one embodiment of the present invention 
BEST MODES FOR CARRYING! OTTT to E INVRmtthm 
General DiscuRRi rm 

'5 The concept underlying the present invention is to 

identify squalene epoxidase genes of oilseed plants (or 
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possibly other plants, since all plants appear to have 
genes for the production of squalene, and particularly 
those plants that are capable of accumulating squalene in 
their harvestable tissue) and then to use that knowledge 

5 to create genetically-modified plants in which the 
expression of squalene expoxidase is decreased partially 
or fully compared to the natural level of expression, so 
that squalene naturally produced by the plants can 
accumulate in the seeds or other tissue to levels that 

10 make extraction commercially attractive. 

The approach taken by the inventors of the present 
invention to identify squalene epoxidase genes of plants 
was initially to use the DNA sequence of a known squalene 
epoxidase gene from yeast to identify equivalent genes in 

15 suitable plant species, e.g. by heterologous 

hybridization, on the assumption that all squalene 
epoxidase genes will have a considerable degree of 
similarity. Once one or several plant squalene epoxidase 
genes have been identified in this way, those plant genes 

20 can then be used to identify additional squalene 
epoxidase genes from other plants. 

Heterologous Hybridization 

25 Nucleic acid hybridization is a technique used to 

identify specific nucleic acids from a mixture. Southern 
analysis is a type of nucleic acid hybridization in which 
DNA is typically digested with restriction enzymes, 
separated by gel electrophoresis and bound to a 

30 nitrocellulose or nylon membrane. A nucleic acid probe, 
which is typically radio- labeled or otherwise rendered 
easily detectable, is hybridized to the bound DNA by 
exposing it to the membrane -bound DNA under specific 
conditions and washing any unbound or loosely bound probe 

35 away. The location of the bound probe is then detected 
by autoradiography or other detection method. The 
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location of the bound probe is an indication that DNA 
sequences that are similar to those in the probe nucleic 
acid are present. Hybridization may also be done with 
DNA of clones of a recombinant DNA library, such as a 
5 cDNA library, when that DNA has been bound to a membrane 
after plating the library out (Ausubel et al., 1994). 
Of course, the method used by the inventors to identify 
the genes disclosed in the present application may be 
used to identify equivalent genes from other plants. As 
10 noted above, the process originally used by the inventors 
to identify the Arabidopsis gene was based on further 
analysis _of a gene that was tentatively identified from a 
publicly available database containing partial sequences 
(Expressed Sequence Tags or EST's) submitted by other 
15 workers from randomly chosen (unidentified) gene clones. 
EST's from other species (such as rice, castor) can also 
be searched in the same way to find other possible 
squalene epoxidase genes present in such plants 
(depending on the more or less accidental sequencing of 
20 the desired genes) using the Arabidopsis and B. napus 
sequences disclosed herein. 

The inventors have, for example, found other EST's 
from plants that have tentatively been identified as 
squalene epoxidase genes by comparing them to the 
25 Ababidopsis and B. napus sequences discussed above. 
Thus, sequences corresponding to Genbank Accession 
Numbers T15019 (obtainable from Dr. C.R. Somerville, 
Carnegie Institution, 290 Panama St., Stanford, CA 94305, 
USA) and W43353 (obtainable from DNA Stock Center, 
30 Arabidopsis Biological Resource Center, Ohio State 

University, 1060 Carmack Road, Columbus, OH 43210-1002, 
USA) have been predicted to correspond to squalene 
epoxidases genes from Ricinus communis (castor) and 
Arabidopsis (a second Arabidopsis gene) . 
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Perhaps more importantly, the process by which the 
B . napus gene was cloned can be used to clone other plant 

species. The (heterologous hybridization) methods are 
well known, but the process requires the knowledge and 

5 use of the novel plant squalene epoxidase sequences 
disclosed in this application. 

If the hybridization and washing are done under 
conditions which are considered stringent (e.g., at 
relatively high temperature and/or low salt and/or high 

10 formamide concentration) , then the sequences detected 
generally have a high degree of similarity to the probe 
nucleic acid. If hybridization and washing are done at 
lower stringency, then it is possible to detect sequences 
that are lower in similarity to the probe. Discussions 

15 of this detection of similar sequences by hybridization 
can be found in Beltz et al . (1983) and Yamamoto and 
Kadowaki (1995) . From the point of view of gene cloning, 
if one obtains a clone for a gene in one organism, one 
can use low stringency hybridization of the DNA clones 

20 corresponding to a related organism to detect the 

homologous gene sequences of that organism. As mentioned 
before, the success of this approach depends on the 
similarity of the sequences of the homologous genes which 
in turn generally depends on the evolutionary 

25 relationship between the organisms. 

Once identified, sequenced and cloned, the DNA of 
suitable plant species may then be modified or 
manipulated with any technique capable of decreasing the 
expression of a natural gene based on an isolated DNA 

30 clone corresponding, at least in part, to that gene. 
Suitable methods, at present, include antisense 
technologies (Bourque, 1995) , co-suppression or gene 
silencing technologies (Meyer, 1995/ Stam et al . , 1997; 
Matzke and Matzke, 1995), and ribozyme technologies 
35 (Wegener et al. 1994; Barinaga, 1993). 
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These technologies are discussed in more detail 
below. 



Down-regulation of Gene expression 
5 General 

The activity of a particular enzyme, such as 
squalene epoxidase, is dependent on, among other things 
(such as the biochemical environment) , the amount of 
10 enzyme (usually, and for the sake of this argument, a 
protein) that is present. The amount of enzyme present 
depends on the expression of the gene or genes encoding 
the enzyme of interest. Gene expression usually includes 
(not necessarily in this order) transcription of DNA to 
15 generate RNA, processing of the RNA produced from 
transcription, transport of RNA to the site of 
translation, translation of mature messenger RNA into 
polypeptide, proteolytic processing and folding of the 
nascent polypeptide, transport of the protein product to 
20 various cellular compartments, and post-translational 
modification of the protein (such as phosphorylation or 
glycosylation) . Any effect or difference in any of the 
processes involved in gene expression can have an effect 
on the level of expression of an enzyme encoded by a 
25 given gene or genes. Gene expression often varies with 
cell type, tissue type and developmental stage. 
Likewise, enzyme levels in different cells and tissues 
and at different developmental stages varies widely. (For 
plant nuclear genes, this is often the result of 
30 differential transcription.) 

Gene expression can also be affected by the 
breakdown of the gene product, the enzyme, or any of the 
intermediates in gene expression, such as precursor RNA. 
From a genetic engineering point of view, in 
35 principle, gene expression can be down-regulated by 
affecting almost any of the processes involved. For 
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example , although the mechanism is not well established, 
antisense technology (as discussed below) decreases the 
amount of translatable messenger RNA (mRNA) in an 
organism. 

5 

A) Antisense technology 

An appropriate antisense technology is disclosed, 
for example, in US patent 5,190,931 issued on March 2, 

10 1993 to Masayori Inouye. The disclosure of this patent 
is incorporated herein by reference. In short, this 
technology can be used to regulate or inhibit gene 
expression in a cell by incorporating into the genetic 
material of the cell a nucleic acid sequence which is 

15 transcribed to produce an mRNA which is complementary to 
and capable of binding to the mRNA produced by the 
genetic material of the cell. The introduced nucleic 
acid sequences include equivalents of the gene to be 
regulated, or parts thereof, oriented in antisense 

20 fashion relative to a transcriptional promoter. Thus, 
the squalene epoxidase sequence, or part thereof, is 
introduced into the genetic material of the cell as a 
construct positioned between a transcriptional promoter 
segment and a transcriptional termination segment. The 

25 mRNA produced when the antisense sequences are 

transcribed binds or hybridizes to the mRNA from the 
squalene epoxidase gene of interest and prevents 
translation to a corresponding protein. Therefore, the 
protein coded for by the gene is not produced, or is 

30 produced in smaller quantities than would otherwise be 
the case. By introducing a gene that has a sequence that 
is antisense to the natural squalene epoxidase gene in 
oilseed plants, the epoxidation of squalene can be 
inhibited or reduced so that squalene accumulates in the 

35 plant tissues, especially the seeds, which can then be 
harvested in the usual way and the squalene extracted 



WO 97/34003 



PCT/CA97/00175 



-11- 

using conventional techniques. 

In terms of the process of antisense down-regulation 
of squalene epoxidase genes, for any plant species, it is 
generally necessary to use a gene from a closely related 
5 plant such that the genes are more than about 60%, and 
preferably about 70%, identical at the DNA level (Murphy, 
1996) . Thus, homologous (equivalent) genes from the same 
family of plants, would reasonably be expected to give an 
antisense effect on any member species of that family. 

10 For example, Arabidopsis genes have been found to have 
antisense effects in B. napus (Murphy, 1996). 

The antisense DNA in expressible form may be 
introduced into plant cells by any suitable 
transformation technique, e.g. in plants transformation 

15 (such as wound inoculation or vacuum infiltration) . 

Transformation may also be carried out by co-cultivation 
of cotyledonary petioles and hypocotyl explants (e.g. of 
B. napus and B. carinata) with A. tumefaciena bearing 

suitable constructs (Moloney et al . (1989) and DeBlock et 
20 al. (1989) ) . 

It would, of course, be optimal to identify a 
natural squalene epoxidase gene for each plant species to 
be modified in order to ensure complete correspondence of 
the DNA used to modify the natural gene and the DNA of 

25 the natural gene itself, if a gene from one plant 

species has been cloned, there are methods available to 
clone the same gene from other plants. The reliability 
of these methods (heterologous hybridization methods) 
depends on the similarity of the DNA sequence of the 

30 genes. If the DNA sequences have at least 60% of their 
sequence identical, and more preferably at least 70%, 
then the methods are usually reliable. Sequence 
similarity depends mostly on evolutionary (ancestral) 
relationships between plants. Practically, this means 

35 that either of the two genes first cloned by the 
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inventors (the Arabidopais and B. napus genes) may be 

used to clone the same gene in any other dicotyledonous 
plant (dicot) , including, but not limited to soybean, 
tobacco, amaranth, potato, cotton, flax, bean, and pea. 
5 It is also reasonable to assume that the Arabidopsis or 
B. napus genes could also be used to clone the same genes 

from monocotyledonous plants (monocots) , such as wheat, 
corn and barley. 

The antisense effect occurs when hybridization can 

10 occur between antisense RNA and native RNA under the 
conditions prevailing in the cell. This may occur when 
the antisense RNA (and corresponding cDNA) contains as 
few as 20 nucleotides. More preferably, however, there 
should be at least 100 nucleotides in the cDNA to 

15 guarantee the required effect, and of course any larger 
portion up to the entire cDNA may be employed. In short, 
therefore, for effective antisense technology, the DNA 
sequence introduced into the plant genome should 
preferably be at least 20 consecutive nucleotides 

20 corresponding the native squalene epoxidase gene, and 
more preferably between 100 and the full DNA sequence of 
the gene . The homology of the added sequence may be at 
least 60%, and more preferably at least 70%, of the 
native plant gene. 

25 

B) Ribozyme Technology 

Another method for downregulating gene expression by 
affecting mRNA levels is ribozyme technology. Ribozymes 

30 are RNA molecules capable of catalyzing the cleavage of 
RNA and other nucleic acids. In nature, Tetrahymena 
preribosomal RNA, some viroids, virusoids and satellites 
RNAs of plant viruses perform self -cleavage reactions. 
The cleavage site for some plant pathogenic RNAs consists 

35 of a consensus structure, called the "hammmerhead" motif. 
The cleavage occurs within this hammerhead 3 1 to a GUX 
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triplet, where X can be C, U, or A. The nucleotide 
region directing the catalysis of the cleavage reaction 
can be separated from the region where the cleavage 
occurs and the recognition of the target RNA can be 
5 modified by changing the nucleotide sequence of the 
regions flanking the cleavage site. As a consequence, 
ribozymes can be designed to catalyze cleavage reactions 
on targeted sequences of separate RNA substrates. This 
provides a means of regulating gene expression, if the 
10 DNA sequence of the gene is known. 

In order to genetically engineer the down-regulation 
of a particular gene in plants, a vector can be 
constructed for transformation that includes one or more 
units, each of which may include a transcriptional 
15 promoter and a sequence encoding a ribozyme designed to 
cleave RNA transcribed from the gene or genes of 
interest. An example of this in plants has been provided 
by.Schreier and co-workers (Steinecke et al . 1992, 
Wegener et al . 1994) in which a ribozyme was designed 
20 against neomycin phosphotransferase mRNA. Separate DNA 
constructs encoding the ribozyme and the neomycin 
phosphotransferase (npt) gene were used to transform 
plants. In plants containing both constructs, a 
reduction neomycin phosphotransferase activity was 
25 observed relative to plants transformed with only the npt 
gene construct. 

Ribozyme technology also appears to be successful in 
other eukaryotes, such as the fruit fly (Zhao and Pick, 



1993) . 

30 



C) Co-suppression or Homology-Dependent Gene Silencing 

When attempts have been made to overexpress 
homologous genes in plants, often a small fraction of the 
35 resulting transgenic plants are found to have very low 
levels of expression of both the native gene and the 
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introduced gene (transgene) . This phenomenon has been 
called co-suppression or homology-dependent gene 
silencing (Stam et al. 1996, Matzke and Matzke 1995) . 
The mechanism by which co- suppress ion occurs is very 

5 poorly understood. However, advantage can be taken of 
the phenomenon to down-regulate the expression of a gene 
of interest. This can be accomplished by transforming a 
plant with a DNA construct which contains a strong 
transcriptional promoter driving the sense transcription 

10 of a DNA sequence with high similarity to the gene of 
interest. For example, when the chalcone synthase gene 
was introduced into petunia in an attempt to overproduce 
chalcone synthase (which is involved in flower pigment 
biosynthesis) , some transgenic plants showed pigment 

15 patterns and enzyme levels that indicated the suppression 
of chalcone synthase gene expression (Jorgensen 1990) . 
Investigation of examples such as these has shown that 
the effect is often associated with repetition of the 
transgene inserts in the plant genome. Cosuppression may 

20 be dependent on the coding region of a gene or on the 
promoter and other non-coding regions. 

Thus, the down- regulation of squalene epoxidase in 
plants may be engineered with the use of cDNA sequence 
that are disclosed herein, or with plant genomic 

25 sequences which may include the promoter or promoters of 
squalene epoxidase genes. 

D) Other variations 

30 Variations on the process of increasing squalene in 

plants include the use of different promoter sequences 
which may give rise to increased squalene in other 
tissues and at various stages of development . For 
example, the use of the cauliflower mosaic virus 35S 

35 promoter is likely to have an effect in most plant 
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tissues. Other seed-specific and tissue-specific 
promoter may also be used. 

Also, other plant transformation methods may be used 
such as the particle gun technique (Christou 1993) . 
5 As well, other vectors, selectable markers, 

transcription terminators, etc., may be used (Guerineau 
and Mullineaux 1993) . 

It has already been observed that overexpression of 
a fragment of the hamster 3-hydroxymethyl-3-glutaryl CoA 
10 reductase (HMGR) gene in plants can elevate squalene 
levels in plants (Chappell et al. 1994). This is likely 
due to the fact that the level of HMGR limits the flow of 
carbon through the mevalonate/sterol pathway that 
includes squalene. It would be expected that a 
15 combination of elevated HMGR levels and down- regulated 
squalene epoxidase levels would have an effect on raising 
squalene levels that would be larger than the effect of 
either elevated HMGR alone or down- regulated squalene 
epoxidase alone. 

20 

Experimental Detail 

IDENTIFICATION OF THE SQU ALENE EPOXIDASE C,F.MF. 

The DNA sequence of the squalene epoxidase gene of 
25 yeast was published by Jandrositz et al. (1991) . Using 

the TBLASTN™ computer search program (Altschul et al. 

1990) and the yeast squalene epoxidase (predicted) amino 

acid sequence, the sequence was used to search a database 

which included partial cDNA sequences called "the Non- 
30 Redundant database" maintained by the National Center for 

Biotechnology Information (NCBI) in the United States. 

This database is a non-redundant nucleotide database made 

up of: 



35 



pdb 

genbank 



Brookhaven Protein Data Bank, April 1994 Release 
Genbank® Release 87.0, February 15, 1995 
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gbupdate Genbank® cumulative updates to genbank major release 

embl EMBL data library, Release 41.0, December 1994 

emblu E MBL Data Library, cumulative updates to embl major release 



5 maintained by the National Center for Biotechnology 
Information (NCBI) , National Library of Medicine, 
National Institute of Health, Bethesda, MD 20894, 
U.S.A. ) . 

The database included expressed sequence tags 
10 (ESTs) , i.e. partial sequences of more-or-less randomly 
chosen cDNA clones. This search identified the 
Arabidopsis thaliana cDNA clone 129F12T7 (Genbank 

accession no. T44667) as a putative squalene epoxidase 
gene. This clone was the seventh highest scoring 

15 sequence in this search and the highest scoring plant 
sequence. The P(N) of 1.9 x 10* 5 was considered 
borderline significant. The single high-scoring pair 
(HSP) of subsequences found was a stretch of 46 
nucleotides with 21 positions identical (45%) . Searches 

20 with the T44667 sequence revealed that a large portion of 
the 46 nucleotide region (29 nucleotides) matches a 
sequence motif found in a variety of enzymes that bound 
adenine dinucleotides, such as flavin adenine 
dinucleotide (FAD; which at least some squalene 

25 epoxidases are known to use as a cof actor; see Wierenga 
et al. 1986). So, in fact, the search, done when only 
the partial DNA sequence (T44667) was available, 
suggested the possibility, but did not confirm that 
T44667 corresponded to a squalene epoxidase gene. 

30 The 129F12T7 clone was obtained and its DNA 

sequenced completely by the inventors at the Plant 
Biotech Institute of the National Research Council of 
Canada at Saskatoon, Saskatchewan, Canada. The DNA 
sequence of the cDNA insert of pl29F12T7 is shown in the 

35 Sequence Listing (see later) as SEQ ID NO: 1. After the 
full sequence of the insert of pl29F12T7 was obtained, 
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the Non-Redundant Protein Database (NCBI) was searched 
using the BLAST™ software (Altschul et al . 1990) (NCBI) 
based on the predicted amino acid sequence. The amino 
acid sequence corresponding to the open reading frame of 
5 SEQ ID NO:l are shown in the Sequence Listing as SEQ ID 
NO: 2. The Arabidopsis sequence gave the highest scoring 
matches with squalene epoxidase sequences including that 
of rat (P{N)=5 x 10"* 0 ) and yeast (P(N)=9.2 x 10"" ). No 
sequences which had been reliably identified had P (N) 
10 values less than 10". These numbers indicate that the 
product of the Arabidopsis gene is, in all probability, 
squalene epoxidase. 

The 129F12T7 clone was used to probe a B . napus cDNA 
library, obtained from Dr. Edward Tsang of the Plant 
15 Biotech Institute. Two independent clones, pDRlii and 
PDR411 were isolated and sequenced. The Sequence Listing 
shows the DNA sequences of the cDNA inserts of pDRlii 
[SEQ ID NO:3] and pDR411 [SEQ ID N0:5] and the amino acid 
sequences corresponding to the coding regions of SEQ ID 
20 NO:3 [SEQ ID NO:4] and SEQ ID NO: 5 [SEQ ID NO: 11] . 
PDRlii and pDR4ll have similar (but not identical) DNA 
sequences which are also similar to the 129F12T7 
sequence. Plasmids P 129F12T7, pDRlii and pDR411 were 
deposited at the American Type Culture Collection (ATCC) , 
25 12901 Parklawn Drive, Rockville, Maryland 20852-1776, 
USA, under the terms of the Budapest Treaty on January 9, 
1997 and were accepted. The deposit numbers are, 
respectively, ATCC 97847, ATCC 97846 and ATCC 97845. A 
single deposit receipt and statement of viability was 
30 issued for all three deposits on January 17, 1997. 

Figure 1 of the accompanying drawings shows an 
alignment of amino acid sequences for the 129F12T7 clone 
[SEQ ID NO:2], the pDRlii clone [SEQ ID NO:4] and the 
PDR411 [SEQ ID NO:ll] clone, along with the squalene 
35 epoxidase sequences amino acid sequences for mouse [SEQ 
ID NO:6], rat [SEQ ID NO:7] and yeast [SEQ ID NO:8] The 



1 
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plant sequence show blocks of high similarity to the 
non -plant sequences, including the region thought to 
correspond to an adenine dinucleotide -binding site 
(residues 45-88 of the Arabidopsis sequence; Wierenga et 

5 al . 1986; Sakakibara et al. 1995), as well as in the 
C- terminal half of the sequence. The amino acid sequence 
similarities based on this alignment are shown in Table 1 
below. 
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Table 1 

Amino acid sequence similarities 

calculated by MEGALIGN™ software for the 
alignment of Figure 1. 





PDR41I 
Predicted 
Amino 
Acid 
Sequence 


pl29F12T7 
Predicted 
Amino 
Acid 
Sequence 


Mouse 
Squalene 
Epoxidase 
Predicted 

Amino 

Acid 


Rat 
Squalene 
Epoxidase 
Predicted 
Amino 
Acid 
Sequence 


Yeast 
Squalene 
Epoxidase 
Predicted 

Am inn 

/ml III ll\J 

Acid 
Sequence 


pDRlll 
Predicted 
Amino Acid 
Sequence 


74.8 


59.6 


27.0 


26.4 


21.5 


pDR41I 
Predicted 
Amino Acid 
Sequence 




62.9 


29.2 


27.8 


21.3 


P129F12T7 
Predicted 

Amino Acid 
Sequence 






27.3 


26.1 


20.9 


Mouse 
Squalene 
Epoxidase 
Predicted 
Amino Acid 
Sequence 








91.8 


30.4 


Rat 
Squalene 
Epoxidase 
Predicted 
Amino Acid 
Sequence 










30.4 



Analysis of the pDR41l sequence suggests it has an 
10 intron in the 3 • -end of its amino acid coding region 
which is, of course, unusual in cDNA. If nucleotides 
1473-1629 (inclusive) are removed from the sequence and 
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the cDNA translated, the C-terminus is more similar to 
the pDRlll and pl29F12T7 amino acid sequences [SEQ ID 
NO: 4 and SEQ ID NO: 2] . Also, there are sequence patterns 
in this region that are common to other plant introns (5' 

5 and 3' splice consensus sequences and high AT content 
(Goodall and Filipowicz, 1991)). This may mean that the 
pDR411 clone represents an intermediate or precursor RNA, 
rather than the final messenger RNA (mRNA) . There can 
therefore be less certainty in predicting the full amino 

10 acid sequence corresponding to pDR411, although this 
predicted sequence is shown in Fig. 1 [SEQ ID NO: 11] . 
However, the possible presence of a small intron in the 
3' -end of pDR411 does not cause a problem for its use in 
antisense techniques. 

15 Employing the plant squalene epoxidase sequences, 

transgenic plants can be generated which accumulate 
squalene in their seeds. This can be done by established 
genetic transformation methods using DNA constructs that 
include the napin or other seed-specific promoters 

20 (Kridl, 1988; Anonymous, 1995) and fragments of plant 
squalene epoxidase genes arranged in the antisense 
orientation. Downregulation of the squalene epoxidase 
gene in seeds by antisense technology (Inouye, 1990; 
Bourque, 1995) will prevent the conversion of squalene to 

25 squalene expoxide and result in squalene accumulation. 
ISOLATION OF SQUALENE EPOXIDASE GENE IN B. NAPUS 

The 129F12T7 clone obtained as described above was 
used to probe for the homologous gene in B. napus as 

follows . 

30 Unless otherwise noted all molecular biology methods 

were performed as described in Ausubel et al. (1994) . 

The Arabidopsis 129F12T7 DNA Probe 



35 The plasmid pl29F12T7 was digested with the 

restriction enzymes Sal I and Not I. The resulting DNA 
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fragments were separated by agarose gel electrophoresis. 
The l.akb Sal I/Not I DNA fragment corresponding to the 
Arabidopsis squalene epoxidase cDNA was purified from a 
gel band. A radiolabeled DNA probe was prepared by the 
5 random priming method and [alpha-32P] -dCTP (deoxycytidine 
triphosphate) . 

Library Screening 

10 The probe produced as above was used to screen a B. 

napus cDNA library, kindly provided by Dr. Edward Tsang 
of the Plant Biotechnology Institute (Saskatoon, 
Saskatchewan, Canada) . To construct the library, B. 
napus seedlings (cv. Westar) were grown (on half strength 
15 Murashige and Skoog agar (1%) medium supplemented with 1% 
sucrose) in the dark at 22°C for two weeks after 
germination and exposed to light for 24 hours. PolyA+ 
RNA was extracted from the seedlings and first strand 
CDNA synthesis was primed with an oligo dT/Not I 
20 adapter/primer. Sal I adapters were ligated after second 
strand cDNA synthesis and a library was constructed in 
Not I/Sal I arms of the Lambda ZipLox vector (Life 
Technologies) . 

The library was plated using standard methods and 
25 the Y1090 strain of E. coli. Approximately 25,000 
plaques from the library were plated, lifted onto 
Hybond®-C nylon membranes (Amer sham) and hybridized with 
the above probe according to the manufacturer's 
instructions. After two rounds of plaque purification, 
30 two independent clones, pDRlll and pDR411 were isolated 
by in vivo excision. 

The pl29F12T7, pDRlll and pDR411 clones were 
sequenced using the PRISM® DyeDeoxy Terminator Cycle 
Sequencing System (Perkin Elmer/Applied Biosystems) and a 
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Model 373 DNA Sequencer (Applied Biosystems) . DNA 
sequences were assembled and analyzed using the 
Lasergene® suite of software (DNASTAR, Inc.) and BLAST® 
and related software of the NCBI . 

5 

CONSTRUCTION OF VECTORS FOR PLANT TRANSFORMATION 

Figs .2,3 and 4 show three vectors constructed for 
plant transformation, namely pSE129A, pSElllA and 
pSE411A. In these drawings, the following abbreviations 
10 are used: 

nosT 3 1 -terminus of the nopaline synthase gene 

SE129 Sal I/Not I insert of pl29F12T7 

SE111 Sal I/Xba I fragment of the insert of pDRlll 

15 SE411 Sal I/Not I insert of pDR411 

Napin P napin gene promoter ( Josef sson 1986) . 

All other elements are described by Guerineau and 
Mullineaux (1993), Thomas et al. (1992) and Beban (1984), 

20 

These plasmids were constructed as follows. 

pDHl 

The plasmid pE3 5SNT was obtained from Raju Datla 
25 (Plant Biotechnology Institute, Saskatoon, Saskatchewan 
Canada) . It contains a double 3 5S promoter and nopaline 
synthase (Nos) terminator (Datla, 1992) in pUC19. It was 
digested with Hind III and Xba I to remove the double 35S 
promoter. The napin promoter (Josef sson et al . 1987) was 
30 isolated from pNap (obtained from Ravi Jain, Plant 

Biotechnology Institute, Saskatoon, Saskatchewan, Canada) 
by Hind III and Xba I digestion. The plasmid pDHl was 
produced by ligation of the large pE35SNT/Hind III/Xba I 
fragment and the Hind III/Xba I napin promoter fragment. 
35 Thus, pDHl contained the napin promoter and the Nos 
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terminator between the Hind III and EcoR I sites of the 
pUC19 vector. 

PSE129A 

5 The pl29F12T7 plasmid was digested with Pst I and 

Hind III. The fragment containing the Arabidopsis 
squalene epoxidase cDNA was ligated to the Pst I- and 
Hind Ill-digested vector pTrcHisB ( INVI TROGEN® ) to give 
the circular plasmid P TrcHisl29. P TrcHisl29 was digested 
10 with Xba I and BamH I and the squalene epoxidase cDNA 
fragment was ligated into Xba I- and BamH I -digested 
pDHl. The resulting plasmid pDH129A contained the 
squalene epoxidase cDNA in antisense orientation 
downstream from the napin promoter and upstream of the 
15 Nos terminator. pDH129A was digested with Hind III and 
partially digested EcoR I and the fragment containing 
napin promoter, squalene epoxidase cDNA and Nos 
terminator was ligated into Hind III- and EcoR I -digested 
PRD400 (a binary vector for plant transformation 
20 containing a gene conferring kanamycin resistance; (Datla 
et al. 1992)) to give pSE129A. 

pSElllA 

The pDRlll plasmid was digested with Sma I and Xba 
25 I. The fragment containing a B. napus squalene epoxidase 
cDNA (excluding a small part of the 3- end downstream of 
the Xba I site) was ligated to the large fragment of Sma 
I- and XBa I-digested pDH129 vector (containing the napin 
promoter and Nos terminator) to give the circular plasmid 
30 pDHlllA. pDHlllA contained the squalene epoxidase cDNA 
in antisense orientation downstream from the napin 
promoter and upstream of the Nos terminator. pDHlllA was 
digested with Hind III and partially with EcoR I and the 
fragment containing napin promoter, cDNA and Nos 
35 terminator was ligated into Hind III- and EcoR I-digested 
pRD4 00 to give pSElllA. 
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pSE411A 

The pDR411 plasmid was digested with Sma I and Xba I. 
The fragment containing a B. napus squalene epoxidase 

5 cDNA was ligated to the large fragment of Sma I- and Xba 
I -digested pDH129A vector (containing the napin promoter 
and Nos terminator and excluding the Arabidopsie cDNA 

sequence) to give the circular plasmid pDH411A. pDH411A 
contained the squalene epoxidase cDNA in ant i sense 

10 orientation downstream from the napin promoter and 

upstream of the Nos terminator. pDHlllA was digested with 
EcoR I and partially digested with Hind III and the 
fragment containing napin promoter, squalene epoxidase 
cDNA and Nos terminator was ligated into Hind III- and 

15 EcoR I-digested pRD400 (Datla et al . 1992) to give 
PSE411A. 

The final vectors pSE129A, pSElllA and pSE411A were 
deposited on March 5 f 1997 under the terms of the 
Budapest Treaty at the American Type Culture Collection, 
20 12301 Parklawn Drive, Rockville, MD 20852, USA; under 
deposit nos. ATCC 97910, ATCC 97909 and ATCC 97908, 
respectively) . These vectors were introduced into 
Agrobacterium tumefaciens strain GV3101 (bearing helper 

plasmid pMP90; Koncz and Schell, 1986) by 
25 electroporation. 

PLANT GROWTH COITIONS 

All A. thaliana control and transgenic plants were 
grown in controlled growth chambers, under continuous 
30 fluorescent illumination (150-200 \xE' m" 2 sec" 1 ) at 22°C, 
as described by Katavic et al. (1995) . 



PLANT TRANSFORMATION 
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The PSE129A construct was tested in A. thaliana by 
in planta transformation techniques. 

Wild type (WT) A. thaliana plants of ecotype 
Columbia were grown in soil. In planta transformation 
5 was performed by vacuum infiltration {Bechtold et al . 
1993) with overnight bacterial suspension of A. 
tumefaciens strain GV3101 bearing helper nopaline plasmid 
PMP90 (disarmed Ti plasmid with intact vir region acting 
in trans, gentamycin and kanamycin selection markers; 
10 Koncz and Schell (1986)) and binary vector pSE129A. 

After infiltration, plants were grown to set seeds 
(T, generation) . Dry seeds (T x generation of seeds) were 
harvested in bulk and screened on selective medium with 
50 mg/L kanamycin. After two to three weeks on selective 
15 medium, surviving seedlings were transferred to soil . 
Mature seeds from these seedlings <T 2 seeds) were used 
for squalene analysis. Mature seeds from untransf ormed 
wild type (WT) Columbia plants and pRD400 transgenic 
plants (binary vector pRD400, containing only kanamycin 
20 selection marker; Datla et al. 1992) were used as 
controls in analyses of seed lipids. 

Seed Analysis 

25 Seeds were analyzed for squalene levels as follows: 

In all steps, care was taken to avoid contamination 
from external sources, particularly human skin. 5-10mg of 
Arabidopais seeds were weighed and rinsed with hexane to 

30 remove any external contamination. 1 ml of 7.5% KOH (in 
95% methanol) was added to each sample and 250ng of 
squalane were added as internal standard. (Squalane is 
the hydrogenated form of squalene.) Seeds were 
homogenized with a Polytron® (Model PR0200, pro 

35 Scientific) at maximum speed f or 40 seconds. The head of 
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the Polytron was washed with 1 ml of 7.5% KOH (in 95% 
methanol) and the wash was pooled with the homogenate. 
The mixture was incubated at 80°C for 1 hr, then cooled 
to room temperature. The mixture was centrifuged at 3 000 
5 g for 5 min, and the supernatant was transferred to a 
fresh tube. One ml of H 2 0 and 1.5 ml of hexane were 
added to the supernatant and, after vortexing, the 
mixture was centrifuged at 3000 gr for 5 minutes. The 

hexane (top) layer was transferred to another test tube. 

10 The aqueous phase was re-extracted with 1.5 ml hexane and 
the hexane fractions were pooled. The hexane fraction 
was extracted with 1 ml of water/methanol/KOH (50:50:2) 
and evaporated under nitrogen. The residue was dissolved 
in 50 ul of hexane and transferred to an autosampler 

15 vial. Gas-liquid chromatography was performed with a DBS 
column (J & W Scientific, USA) using the following 
parameters : 



25 Transgenic Results 

Seeds from 9 Arabidopsis lines transformed with 

pRD400 and 55 lines transformed with pSE129A were 
analyzed for squalene content. Table 2 below shows the 
30 results for all of the pRD400 transgenic lines and 4 

pSE129A lines. 



20 



Column Temperature 



Injector Temperature 
Detector Temperature 



0- 1 min 180°C 

1- 16 min 180-280°C 

(linear ramp) 
16-30 min 280°C 

275°C 

300°C. 
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Table 2 



Line 


Vector 


Squalene ug/g 
drv we i crh t~ 


Standard Deviation 


k401 


PRD400 


4.04 


0.5 


k4 02 


pRD400 


4.71 


0 . 16 


k403 


pRD400 


4.39 


0 . 34 


k404 


pRD400 


4 .86 


0 . 75 


k405 


pRD400 


3.92 


0. 92 


k406 


pRD400 


4.04 


1.68 


k4 0 9 


PRD400 


5.03 


|~ 0.85 


k410 


pRD400 


6.09 


1.22 


k411 


PRD400 


4 .57 


1.26 










k9 


PSE129A 


9.96 


1.59 


kl2 


PSE129A 


11.34 


2.01 


k50 


PSE129A 


12.38 


0.35 


k54 


PSE129A 


9.76 


1.43 



The mean and standard deviation of the 9 pRD400 li 
5 4.6 and 0.7, respectively. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: National Research Council of Canada 

(B) STREET: 1200 Montreal Road 

(C) CITY: Ottawa 

(D) STATE: Ontario 

(E) COUNTRY: Canada 

(F) POSTAL CODE: K1A 0R6 

(G) TELEPHONE: (613) 993-3899 

(H) TELEFAX: (613) 952-6082 

(A) NAME: Dr. Patrick S. Covello 

(B) STREET: 40 Weir Crescent 

(C) CITY: Saskatoon 

(D) STATE: SK 

(E) COUNTRY: Canada 

<F) POSTAL CODE (ZIP) : S7H 3A9 
(G> TELEPHONE: (306) 975-5269 
(H) TELEFAX: (306) 975-4839 

(A) NAME: Dr. Martin J.T. Reaney 

(B) STREET: 1027 5th Street East 

(C) CITY: Saskatoon 

(D) STATE: SK 

(E) COUNTRY: Canada 

(F) POSTAL CODE (ZIP) : S7H 1H3 

(A) NAME: Dr. Samuel L. MacKenzie 

(B) STREET: 17 Cambridge Crescent 

(C) CITY: Saskatoon 

(D) STATE: SK 

(E) COUNTRY: Canada 

(F) POSTAL CODE (ZIP) : S7H 3P9 

(ii) TITLE OF INVENTION: Process for Raising Squalene Levels in Plants 

and DNA Sequences Used Therefor 
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(iii) NUMBER OF SEQUENCES: 11 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

(D) SOFTWARE : Patentln Releaee #1.0, Version #1.30 (EPO) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1756 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis thaliana 

(B) STRAIN: Columbia 

(D) DEVELOPMENTAL STAGE: 3 different stages 
(F) TISSUE TYPE: 4 different tissues 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Lambda -PRL2 

(B) CLONE: 129F12T7 

(ix) FEATURE: 

(A) NAME/ KEY : CDS 

(B) LOCATION: 15. .1565 

(D) OTHER INFORMATION : / codon_start= 15 
/function^ -converts squalene to 
2, 3-oxidoequalene" 
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/EC_number= 1.14.99.7 
/product « "squalene epoxidase" 
/standard_name- "squalene monooxygenase 
(2, 3-epoxidizing) " 

(ix) FEATURE: 

(A) NAME/KEY: 3'UTR 

(B) LOCATION: 1566.. 1756 

(ix) FEATURE: 

(A) NAME/KEY: polyA_site 

(B) LOCATION: 1756 

(ix) FEATURE: 

(A) NAME/KEY: 5'UTR 

(B) LOCATION : 1 - .14 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

CCACGCGTCC GGCA ATG ACT TAC GCG TGG TTA TGG ACG CTT CTC GCC TTT 50 
Met Thr Tyr Ala Trp Leu Trp Thr Leu Leu Ala Phe 
15 10 

GTT CTG ACA TGG ATG GTT TTT CAC CTC ATC AAG ATG AAG AAG GCG GCA 98 
Val Leu Thr Trp Met Val Phe His Leu lie Lys Met Lye Lys Ala Ala 
15 20 25 

ACC GGA GAT TTA GAG GCC GAG GCA GAA GCA AGA AGA GAT GGT GCA ACG 146 
Thr Gly Asp Leu Glu Ala Glu Ala Glu Ala Arg Arg Asp Gly Ala Thr 
30 35 40 

GAT GTC ATC ATT GTT GGG GCG GGT GTT GCA GGC GCT TCT CTT GCT TAT 194 
Asp Val lie lie Val Gly Ala Gly Val Ala Gly Ala Ser Leu Ala Tyr 
45 50 55 60 



GCT TTA GCT AAG GAT GGA CGA CGA GTA CAT GTG ATA GAG AGG GAC TTA 
Ala Leu Ala Lys Asp Gly Arg Arg Val His Val He Glu Arg Asp Leu 
65 70 75 



242 
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AAA GAG CCA CAA AGA TTC ATG GGA GAG CTG ATG CAA GCG GGA GGT CGC 
Lye Glu Pro Gin Arg Phe Met Gly Glu Leu Met Gin Ala Gly Gly Arg 
80 as 90 



290 



TTC ATG TTA GCC CAG CTT GGC CTC GAA GAT TGT TOG GAG GAC ATA GAC 
Phe Met Leu Ala Gin Leu Gly Leu Glu Asp Cys Leu Glu Asp He Asp 
9S 100 105 

GCA CAA GAA GCG AAG TCC TTG GCA ATA TAC AAG GAT GGA AAA CAC GCG 
Ala Gin Glu Ala Lys Ser Leu Ala lie Tyr Lys Asp Gly Lys His Ala 
110 "5 120 



338 



386 



ACA TTG CCT TTT CCA GAT GAC AAG ACT TTT CCT CAT GAG CCA GTA GGT 
Thr Leu Pro Phe Pro Asp Asp Lys Ser Phe Pro His Glu Pro Val Gly 
130 "5 140 



125 



434 



AGA CTC TTA CGT AAT GGT CGG CTG GTA CAA CGT TTA CGC CAA AAA GCA 
Arg Leu Leu Arg Asn Gly Arg Leu Val Gin Arg Leu Arg Gin Lys Ala 
145 150 155 

GCT TCT CTT AGC AAT GTT CAA TTA GAA GAA GGA ACA GTG AAG TCT TTA 
Ala Ser Leu Ser Asn Val Gin Leu Glu Glu Gly Thr Val Lys Ser Leu 
"0 165 170 

ATT GAA GAA GAA GGA GTG GTC AAA GGA GTG ACA TAC AAA AAT AGC GCA 
lie Glu Glu Glu Gly Val Val Lys Gly Val Thr Tyr Lys Asn Ser Ala 
175 180 i 85 

GGC GAA GAA ATA ACQ GCC TTT GCA CCT CTT ACT GTC GTA TGC GAT GGT 
Gly Glu Glu lie Thr Ala Phe Ala Pro Leu Thr Val Val Cys Asp Gly 
190 195 2O0 

TGT TAT TCG AAC CTT CGT CGG TCA CTC GTG GAT AAT ACT GAG GAA GTC 
Cys Tyr Ser Asn Leu Arg Arg Ser Leu Val Asp Asn Thr Glu Glu Val 

CTC TCG TAC ATG GTG GGT TAC GTC ACQ AAG AAT AGC CGA CTT GAA GAT 
Leu Ser Tyr Met Val Gly Tyr Val Thr Lys Asn Ser Arg Leu Glu Asp 

230 



4B2 



530 



578 



626 



674 



722 



WO 97/34003 



PCT/CA97/00175 



-32- 

CCC CAT AGT CTA CAT TTG ATA TTT TCT AAA CCT TTG GTT TGT GTT ATA 770 
Pro His Ser Leu His Leu lie Phe Ser Lys Pro Leu Val Cys Val lie 
240 245 250 

TAT CAA ATA ACC AGT GAT GAA GTT CGT TGT GTT GCC GAA GTT CCC GCT 818 
Tyr Gin lie Thr Ser Asp Glu Val Arg Cys Val Ala Glu Val Pro Ala 
255 260 265 

GAT AGT ATT CCT TCT ATA TOG AAT GGT GAA ATG TCT ACC TTC CTC AAG 866 
Asp Ser He Pro Ser He Ser Asn Gly Glu Met Ser Thr Phe Leu Lys 
270 275 280 

AAA TCA ATG GCT CCT CAG ATA CCT GAA ACT GGA AAT CTT CGG GAG ATA 914 
Lys Ser Met Ala Pro Gin He Pro Glu Thr Gly Asn Leu Arg Glu He 
285 290 295 300 

TTT TTG AAA GGC ATA GAG GAA GGA TTA CCA GAG ATA AAA TCA ACA GOG 962 
Phe Leu Lys Gly lie Glu Glu Gly Leu Pro Glu He Lys Ser Thr Ala 
305 310 315 

ACG AAA AGT ATG TCA TCG AGA TTG TGT GAT AAA AGA GGA GTG ATT GTG 1010 
Thr Lys Ser Met Ser Ser Arg Leu Cys Asp Lys Arg Gly Val He Val 
320 325 330 

TTG GGA GAT GCA TTC AAT ATG CGT CAT CCT ATA ATC GCG TCA GGA ATG 1058 
Leu Gly Asp Ala Phe Asn Met Arg His Pro He He Ala Ser Gly Met 
335 340 345 

ATG GTT GCA CTC TCG GAC ATT TGC ATT CTA CGC AAT CTT CTC AAA CCA 1106 
Met Val Ala Leu Ser Asp He Cys He Leu Arg Asn Leu Leu Lys Pro 
350 355 360 

TTG CCT AAC CTC AGC AAT ACT AAG AAA GTC TCT GAT CTT GTC AAG TCC 1154 
Leu Pro Asn Leu Ser Asn Thr Lys Lys Val Ser Asp Leu Val Lys Ser 
365 370 375 380 



TTT TAC ATC ATC CGC AAG CCA ATG TCA GCG ACC GTG AAC ACG CTC GCG 
Phe Tyr He He Arg Lys Pro Met Ser Ala Thr Val Asn Thr Leu Ala 
385 390 395 



1202 
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AGT ATC TTT TCA CAA OTG CTT GTT GCT ACA ACA GAC GAA GCA AGA GAG 1250 
Ser He Phe Ser Gin Val Leu Val Ala Thr Thr Asp Glu Ala Arg Glu 
4 °0 405 410 

GGA ATG CGA CAA GGC TGC TTC AAT TAC CTA GCT CGT GGA GAT TTT AAA 1298 
Gly Met Arg Gin Gly Cys Phe Asn Tyr Leu Ala Arg Gly Asp Phe Lys 
415 420 425 

ACA AGG GGA TTG ATG ACT ATT CTC GGA GGC ATG AAC CCT CAC CCT CTT 1346 
Thr Arg Gly Leu Met Thr lie Leu Gly Gly Met Asn Pro His Pro Leu 
430 43s 44Q 

ACT CTA GTC CTT CAT CTT GTA GCC ATC ACC CTT ACG TCC ATG GGC CAC 1394 
Thr Leu Val Leu His Leu Val Ala lie Thr Leu Thr Ser Met Gly His 
445 450 455 460 

TTG CTC TCT CCG TTT CCT TOG CCT CGT CGC TTT TGG CAT AGC CTC AGA 1442 
Leu Leu Ser Pro Phe Pro Ser Pro Ar9 Arg Phe Trp His Ser Leu Arg 
465 470 47S 



1490 



ATT CTT GCC TGG GCT TTG CAA ATG TTG GGT GCA CAT TTA GTG GAT GAA 
He Leu Ala Trp Ala Leu Gin Met Leu Gly Ala His Leu Val Asp Glu 
480 485 490 

GGA TTC AAG GAA ATG TTG ATT CCA ACA AAC GCA GCT GCT TAT CGA AGG 1538 
Gly Phe Lys Glu Met Leu lie Pro Thr Asn Ala Ala Ala Tyr Arg Arg 
495 500 5 o5 



1585 



AAC TAT ATC GCC ACA ACC ACT GTT TGA TCAATCCATA ACACGAAGAC 
Asn Tyr He Ala Thr Thr Thr Val 
510 515 

TGTTTTATTC GGAGATGAAA AATAACAACT CAAACAGTTA ACTTTCTACA ACCAAATAAA 164 5 
TAATTGTGTG TATATGAAGT TGAGCCTATG GTTAAGCTCT ACTGAATTGT GTTGAAAACA 
AACATGGATA TGTTATATGC TAATTTGTTA TATTCTATTT ATTGATTCTT G 



1705 



1756 



(2) INFORMATION FOR SEQ ID NO: 2: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 516 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Thr Tyr Ala Trp Leu Trp Thr Leu Leu Ala Phe Val Leu Thr Tip 
15 10 15 

Met Val Phe His Leu lie Lys Met Lys Lys Ala Ala Thr Gly Asp Leu 
20 25 30 

Glu Ala Glu Ala Glu Ala Arg Arg Asp Gly Ala Thr Asp Val lie He 
35 40 45 

Val Gly Ala Gly Val Ala Gly Ala Ser Leu Ala Tyr Ala Leu Ala Lys 
50 55 60 

Asp Gly Arg Arg Val His Val lie Glu Arg Asp Leu Lys Glu Pro Gin 
65 70 75 80 

Arg Phe Met Gly Glu Leu Met Gin Ala Gly Gly Arg Phe Met Leu Ala 
85 90 95 

Gin Leu Gly Leu Glu Asp Cys Leu Glu Asp He Asp Ala Gin Glu Ala 
100 105 110 

Lys Ser Leu Ala He Tyr Lys Asp Gly Lys His Ala Thr Leu Pro Phe 
115 120 125 

Pro Asp Asp Lys Ser Phe Pro His Glu Pro Val Gly Arg Leu Leu Arg 
130 135 140 

Asn Gly Arg Leu Val Gin Arg Leu Arg Gin Lys Ala Ala Ser Leu Ser 
145 ISO 155 160 



Asn Val Gin Leu Glu Glu Gly Thr Val Lys Ser Leu He Glu Glu Glu 
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170 



175 



Gly Val Val Lys Gly Val Thr Tyr Lys Asn Ser Ala Gly Glu Glu He 
180 185 190 

Thr Ala Phe Ala Pro Leu Thr Val Val Cys Asp Gly Cys Tyr Ser Asn 
195 200 205 

Leu Arg Arg Ser Leu Val Asp Asn Thr Glu Glu Val Leu Ser Tyr Met 
210 215 220 

Val Gly Tyr Val Thr Lys Asn Ser Arg Leu Glu Asp Pro His Ser Leu 
225 23 ° 235 240 

His Leu lie Phe Ser Lys Pro Leu Val Cys Val lie Tyr Gin lie Thr 
245 250 255 

Ser Asp Glu Val Arg Cys Val Ala Glu Val Pro Ala Asp Ser He Pro 
260 265 270 

Ser He Ser Asn Gly Glu Met Ser Thr Phe Leu Lys Lys Ser Met Ala 
275 260 2B5 

Pro Gin lie Pro Glu Thr Gly Asn Leu Arg Glu He Phe Leu Lys Gly 
290 295 300 



lie Glu Glu Gly Leu Pro Glu lie Lys Ser Thr Ala Thr Lys Ser Met 
305 310 315 



320 



Ser Ser Arg Leu Cys Asp Lys Arg Gly Val lie Val Leu Gly Asp Ala 



325 



330 



335 



Phe Asn Met Arg His Pro lie He Ala Ser Gly Met Met Val Ala Leu 
340 345 



350 



Ser Asp He Cys He Leu Arg Asn Leu Leu Lys Pro Leu Pro Asn Leu 



355 



360 



365 



Ser Asn Thr Lys Lys Val Ser Asp Leu Val Lys Ser Phe Tyr He He 
370 375 



380 
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Arg Lys Pro Met Ser Ala Thr Val Asn Thr Leu Ala Ser lie Phe Ser 
385 390 395 400 

Gin Val Leu Val Ala Thr Thr Aep Glu Ala Arg Glu Gly Met Arg Gin 
405 410 415 

Gly Cys Phe Asn Tyr Leu Ala Arg Gly Asp Phe Lys Thr Arg Gly Leu 
420 425 430 

Met Thr lie Leu Gly Gly Met Asn Pro His Pro Leu Thr Leu Val Leu 
435 440 445 

His Leu Val Ala lie Thr Leu Thr Ser Met Gly His Leu Leu Ser Pro 
450 455 460 

Phe Pro Ser Pro Arg Arg Phe Trp His Ser Leu Arg lie Leu Ala Trp 
465 470 475 480 

Ala Leu Gin Met Leu Gly Ala His Leu Val Asp Glu Gly Phe Lys Glu 
485 490 495 

Met Leu He Pro Thr Asn Ala Ala Ala Tyr Arg Arg Asn Tyr He Ala 
500 505 510 

Thr Thr Thr Val 
515 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1748 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 



(iii) HYPOTHETICAL: NO 
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(XV) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Brassica napus 

(B) STRAIN: Westar 

CD) DEVELOPMENTAL STAGE: 14 day greening-etiolated 
(F) TISSUE TYPE: hypocotyls 

(vii> IMMEDIATE SOURCE: 

(A) LIBRARY: Tsang 

(B) CLONE: pDRlll 

(ix) FEATURE: 

(A) NAME/ KEY: 5'UTR 

(B) LOCATION: 1 . . 18 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION : 19 . . 1575 

(ix) FEATURE: 

(A) NAME/KEY: 3 *UTR 

(B) LOCATION: 1576. .1746 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCACGCGTCC GAAAAGAT ATG GAT ATG GCT TTT GTG GAA GTT TGT TTA CGG 
Met Asp Met Ala Phe Val Glu Val Cya Leu Arg 
520 525 

ATG CTA CTT GTC TTC GTA CTG TCT TGG ACG ATA TTT CAC GTC AAC AAC 
Met Leu Leu Val Phe Val Leu Ser Trp Thr lie Phe His Val Asn Asn 
530 535 540 



AGG AAG AAG AAG AAG GCG ACG AAG TTG GCG GAT CTG GCT ACT GAG GAG 
Arg Lys Lys Lys Lys Ala Thr Lys Leu Ala Asp Leu Ala Thr Glu Glu 

560 



545 55 <> 555 



51 



99 



147 
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AGA AAA GAA GGT GGC CCT GAC GTC ATA ATA GTC GGA GCT GGA GTG GGC 195 
Arg Lys Glu Gly Gly Pro Asp Val lie He Val Gly Ala Gly Val Gly 
565 570 575 

GGC TCA GCT CTC GCC TAT GCT CTT GCT AAG GAC GGG CGT CGA GTA CAT 243 
Gly Ser Ala Leu Ala Tyr Ala Leu Ala Lys Asp Gly Arg Arg Val His 
580 585 590 

GTG ATA GAA AGA GAC ATG AGA GAG CCA GTG AGA ATG ATG GGT GAG TTC 291 
Val He Glu Arg Asp Met Arg Glu Pro Val Arg Met Met Gly Glu Phe 
595 600 605 

ATG CAG CCA GGA GGA CGG CTC ATG CTT TCT AAG CTC GGT CTT CAA GAT 339 
Met Gin Pro Gly Gly Arg Leu Met Leu Ser Lys Leu Gly Leu Gin Asp 
610 615 620 

TGT TTA GAG GAA ATA GAC GCA CAG AAA TCC ACC GGC ATA AGA CTT TTT 387 
Cys Leu Glu Glu He Asp Ala Gin Lys Ser Thr Gly He Arg Leu Phe 
625 630 635 640 

AAG GAC GGA AAA GAA ACT GTC GCA TGT TTT CCG GTG GAC ACC AAC TTT 435 
Lys Asp Gly Lys Glu Thr Val Ala Cys Phe Pro Val Asp Thr Asn Phe 
645 650 655 

CCT TAT GAA CCA TCT GGT CGA TTT TTT CAC AAT GGC CGT TTT GTC CAG 483 
Pro Tyr Glu Pro Ser Gly Arg Phe Phe His Asn Gly Arg Phe Val Gin 
660 665 670 

AGA CTG CGC CAA AAG GCC TCT TCT CTT CCC AAT GTG CGG CTG GAA GAA 531 
Arg Leu Arg Gin Lys Ala Ser Ser Leu Pro Asn Val Arg Leu Glu Glu 
675 680 685 

GGG ACC GTC CGA TCT TTG ATA GAA GAA AAA GGA GTG GTC AAA GGA GTG 579 
Gly Thr Val Arg Ser Leu He Glu Glu Lys Gly Val Val Lys Gly Val 
690 695 700 

AGA TAC AAG AAC AGT TCA GGG GAA GAA ACC ACA TCA TTT GCA CCT CTC 627 
Thr Tyr Lys Asn Ser Ser Gly Glu Glu Thr Thr Ser Phe Ala Pro Leu 
705 710 715 720 
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ACT GTC GTA TGC GAT GGT TGC CAC TCG AAC CTT CGT CGC TCT CTA AAT 675 
Thr Val Val Cys Asp Gly Cys His Ser Asn Leu Arg Arg Ser Leu Asn 
725 730 735 

GAC AAC AAT GCG GAG GTT ACG GCG TAC GAG ATT GGT TAC ATC TCG AGG 723 
Asp Asn Asn Ala Glu Val Thr Ala Tyr Glu lie Gly Tyr lie Ser Arg 
740 745 750 



AAT TGT CGC CTT GAA CAG CCC GAC AAG TTA CAC TTG ATA ATG GCT AAA 
Asn Cys Arg Leu Glu Gin Pro Asp Lys Leu His Leu lie Met Ala Lys 
755 7 6 o 7fis 

CCG TCT TTC GCC ATG TTG TAT CAA GTC AGC AGC ACC GAC GTT CGT TGT 
Pro Ser Phe Ala Met Leu Tyr Gin Val Ser Ser Thr Asp Val Arg Cys 
770 775 780 



785 790 795 



ATG ACG TCC TTC GTG AGG AAC TCT ATT GCT CCC CAG GTA CCT CTA AAA 
Met Thr Ser Phe Val Arg Asn Ser lie Ala Pro Gin Val Pro Leu Lys 
805 810 815 

CTC CGC AAA ACA TTT TTG AAA GGG CTC GAT GAG GGA TCA CAT ATA AAA 
Leu Arg Lys Thr Phe Leu Lys Gly Leu Asp Glu Gly Ser His lie Lys 
820 825 830 

ATT ACA CAA GCA AAG CGC ATC CCA GCT ACT TTG AGC AGA AAA AAG GGA 
He Thr Gin Ala Lys Arg lie Pro Ala Thr Leu Ser Arg Lys Lys Gly 
835 840 845 

GTG ATT GTG TTG GGA GAT GCA TTC AAC ATG CGT CAT CCC GTA ATC GCG 
Val He Val Leu Gly Asp Ala Phe Asn Met Arg His Pro Val lie Ala 
850 855 860 

TCG GGG ATG ATG GTT TTA TTG TCT GAC ATT CTC ATT CTA AGC CGT CTT 
Ser Gly Met Met Val Leu Leu Ser Asp He Leu He Leu Ser Arg Leu 
865 



771 



B19 



AAT TTT GAG CTT CTC TCC AAA AAT CTT CCT TCT GTT TCA AAT GGT GAA 867 
Asn Phe Glu Leu Leu Ser Lys Asn Leu Pro Ser Val Ser Asn Gly Glu 

800 



915 



963 



1011 



1059 



1107 
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CTC AAG CCT TTG GGC AAC CTC GGT GAT GAA AAC AAA GTC TCA GAA GTT 1155 

Leu Lye Pro Leu Gly Asn Leu Gly Asp Glu Asn Lys Val Ser Glu Val 
885 890 695 

ATG AAG TCC TTC TAT GCT CTA CGC AAG CCA ATG TCA GCA ACA GTA AAC 1203 
Met Lye Ser Phe Tyr Ala Leu Arg Lys Pro Met Ser Ala Thr Val Asn 
900 905 910 

ACA CTA GGG AAT TCA TTT TGG CAA GTG CTA ATT GCT TCA ACG GAC GAA 1251 
Thr Leu Gly Asn Ser Phe Trp Gin Val Leu lie Ala Ser Thr Asp Glu 
915 920 925 

GCA AAA GAG GCC ATG CGA CAA GGT TGC TTT GAT TAC CTC TCT AGT GGT 1299 
Ala Lys Glu Ala Met Arg Gin Gly Cys Phe Asp Tyr Leu Ser Ser Gly 
930 935 940 

GGG TTT CGC ACG TCA GGC TTG ATG GCT CTG ATT GGT GGC ATG AAC CCT 1347 
Gly Phe Arg Thr Ser Gly Leu Met Ala Leu He Gly Gly Met Asn Pro 
945 950 955 960 

AGG CCA CTT TCT CTC TTC TAT CAT CTA TTC GTT ATT TCT TTA TCC TCC 1395 
Arg Pro Leu Ser Leu Phe Tyr His Leu Phe Val He Ser Leu Ser Ser 
965 970 975 

ATT GGC CAA CTG CTC TCT CCA TTC CCC ACT CCT CTT CGT GTT TGG CAT 144 3 

He Gly Gin Leu Leu Ser Pro Phe Pro Thr Pro Leu Arg Val Trp His 
980 985 990 

AGC CTC AGA CTT CTT GAT TTG TCT TTG AAA ATG TTG GTT CCT CAT CTC 1491 
Ser Leu Arg Leu Leu Asp Leu Ser Leu Lys Met Leu Val Pro His Leu 
995 1000 1005 

AAG GCC GAA GGA ATA GGT CAA ATG TTG TCT CCA ACA AAT GCA GCG GCG 1539 
Lys Ala Glu Gly He Gly Gin Met Leu Ser Pro Thr Asn Ala Ala Ala 
1010 1015 1020 

TAT CGC AAA AGC TAT ATG GCT GCA ACC GTT GTC TAG ACATTGATGA 1585 
Tyr Arg Lys Ser Tyr Met Ala Ala Thr Val Val 
1025 1030 1035 
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AATATAGATG GTGCACAAAT CTTTGTGATT GTGGATTTGT GAAAATAGTA TTGCAATATG 1645 



TTACTGAAGA AACTTTTCCT TATCCACTTA TAAGTGGAAA TAGGAAGAAT GTGTATATAT 
GTAAGGGGTG ACAATTATTT TGAAATAAAA TTAAGAAAAT AAC 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 518 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asp Met Ala Phe Val Glu Val Cys Leu Arg Met Leu Leu Val Phe 
1 5 10 15 

Val Leu Ser Trp Thr lie Phe His Val Asn Asn Arg Lys Lys Lys Lys 
20 25 30 

Ala Thr Lys Leu Ala Asp Leu Ala Thr Glu Glu Arg Lys Glu Gly Gly 
35 40 45 

Pro Asp Val lie lie Val Gly Ala Gly Val Gly Gly Ser Ala Leu Ala 
50 " 60 

Tyr Ala Leu Ala Lys Asp Gly Arg Arg Val His Val lie Glu Arg Asp 
65 70 7 c 

Met Arg Glu Pro Val Arg Met Met Gly Glu Phe Met Gin Pro Gly Gly 



85 



90 



95 



Arg Leu Met Leu Ser Lys Leu Gly Leu Gin Asp Cys Leu Glu Glu lie 



100 



105 



110 



Asp Ala Gin Lys Ser Thr Gly He Arg Leu Phe Lys Asp Gly Lys Glu 
115 "0 125 



1705 



1746 
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Thr Val Ala Cys Phe Pro Val Asp Thr Asn Phe Pro Tyr Glu Pro Ser 
130 135 140 

Gly Arg Phe Phe His Asn Gly Arg Phe Val Gin Arg Leu Arg Gin Lys 
145 150 155 160 

Ala Ser Ser Leu Pro Asn Val Arg Leu Glu Glu Gly Thr Val Arg Ser 
165 170 175 

Leu He Glu Glu Lys Gly Val Val Lys Gly Val Thr Tyr Lys Asn Ser 
180 185 190 

Ser Gly Glu Glu Thr Thr Ser Phe Ala Pro Leu Thr Val Val Cys Asp 
195 200 205 

Gly Cys His Ser Asn Leu Arg Arg Ser Leu Asn Asp Asn Asn Ala Glu 
210 215 220 

Val Thr Ala Tyr Glu He Gly Tyr He Ser Arg Asn Cys Arg Leu Glu 
225 230 235 240 

Gin Pro Asp Lys Leu His Leu He Met Ala Lys Pro Ser Phe Ala Met 
245 250 255 

Leu Tyr Gin Val Ser Ser Thr Asp Val Arg Cys Asn Phe Glu Leu Leu 
260 265 270 

Ser Lys Asn Leu Pro Ser Val Ser Asn Gly Glu Met Thr Ser Phe Val 
275 280 2B5 

Arg Asn Ser He Ala Pro Gin Val Pro Leu Lys Leu Arg Lys Thr Phe 
290 295 300 

Leu Lys Gly Leu Asp Glu Gly Ser His He Lys He Thr Gin Ala Lys 
305 310 315 320 



Arg He Pro Ala Thr Leu Ser Arg Lys Lys Gly Val He Val Leu Gly 
325 330 335 
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Asp Ala Phe Asn Met Arg His Pro Val He Ala Ser Gly Met Met Val 
340 345 350 

Leu Leu Ser Asp lie Leu lie Leu Ser Arg Leu Leu Lys Pro Leu Gly 
355 360 365 

Asn Leu Gly Asp Glu Asn Lys Val Ser Glu Val Met Lys Ser Phe Tyr 
370 375 380 

Ala Leu Arg Lys Pro Met Ser Ala Thr Val Asn Thr Leu Gly Asn Ser 
385 . 395 4 o0 

Phe Trp Gin Val Leu He Ala Ser Thr Asp Glu Ala Lys Glu Ala Met 
405 410 415 

Arg Gin Gly Cys Phe Asp Tyr Leu Ser Ser Gly Gly Phe Arg Thr Ser 
420 4 25 430 

Gly Leu Met Ala Leu lie Gly Gly Met Asn Pro Arg Pro Leu Ser Leu 
435 440 445 

Phe Tyr His Leu Phe Val He Ser Leu Ser Ser lie Gly Gin Leu Leu 
450 4 &5 460 



Ser Pro Phe Pro Thr Pro Leu Arg Val Trp His Ser Leu Arg Leu 
465 470 475 



Leu 
480 



Asp Leu Ser Leu Lys Met Leu Val Pro His Leu Lys Ala Glu Gly lie 



Gly 

485 490 495 

Gly Gin Met Leu Ser Pro Thr Asn Ala Ala Ala Tyr Arg Lys Ser Tyr 
500 505 51Q 

Met Ala Ala Thr Val Val 
515 

(2) INFORMATION FOR SEQ ID NO: 5: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1893 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDBDNESS : double 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Baesica napus 

(B) STRAIN: Westar 

(D) DEVELOPMENTAL STAGE: 14 day greening-etiolated 
(F) TISSUE TYPE: hypocotyls 

<vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Tsang 

(B) CLONE: pDR411 

(ix) FEATURE: 

(A) NAME /KEY : 5'UTR 

(B) LOCATION :1. .28 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 29. .1466 

(ix) FEATURE: 

(A) NAME/KEY: intron 
(B> LOCATION: 1467. .1623 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1624. .1697 

(ix) FEATURE: 

(A) NAME /KEY : 3'UTR 

(B) LOCATION: 1698. .1893 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CCACGCGTCC GCGGACGCGT GGGCAGATAT GGATCTAGCT TTTCCGCACG TTTGTTTGTG 60 

GACGCTACTC GCCTTTGTGC TGACTTGGAC AGTGTTCTAC GTCAACAACA GGAGGAAGAA 120 

GGTGGCGAAG TTACCCGATG CGGCGACAGA GGTGAGAAGA GACGGTGATG CTGACGTCAT 180 

CATCGTCGGA GCTGGTGTTG GAGGTTCAGC TCTCGCCTAC GCTCTTGCAA AGGATGGGCG 240 

TCGAGTACAT GTGATAGAGA GGGACATGAG GGAACCAGTG AGAATGATGG GTGAATTTAT 300 

GCAACCCGGT GGACGACTAC TGCTTTCTAA GCTTGGTCTT GAAGATTGTT TGGAGGGAAT 360 

AGATGAACAG ATAGCCACAG GCTTAGCAGT TTATAAGGAC GGACAAAAAG CACTCGTGTC 420 

TTTTCCAGAG GACAACGACT TTCCTTATGA ACCTACTGGT CGAGCTTTTT ATAATGGCCG 480 

TTTTGTCCAG AGACTGCGCC AAAAGGCTTC TTCGCTCCCC ACTGTACAAC TTGAAGAAGG 540 

GACTGTAAAA TCTTTGATAG AAGAAAAAGG AGTGATCAAA GGAGTGACAT ACAAGAATAG 600 

TGCAGGCGAA GAAACGACTG CATTTGCACC TCTCACAGTG GTATGCGACG GTTGCTATTC 660 

AAACCTTCGT CGGTCTGTTA ACGACAACAA TGCGGAGGTT ATATCGTACC AAGTTGGTTA 720 

CGTCTCAAAG AATTGTCAGC TTGAAGATCC TGAAAAGTTA AAATTGATAA TGTCTAAACC 780 

TTCCTTCACC ATGTTGTATC AAATAAGCAG CACCGATGTT CGTTGTGTTA TGGAGATTTT 840 

CCCCGGCAAT ATTCCTTCTA TTTCAAATGG CGAAATGGCT GTTTATTTGA AAAATACTAT 900 

GGCTCCTCAG GTACCTCCAG AACTCCGCAA AATATTTTTG AAAGGAATTG ATGAGGGAGC 960 

ACAAATTAAA GCGATGCCAA CAAAGAGAAT GGAAGCTACT TTGAGCGAAA AGCAAGGAGT i 020 

GATTGTGTTG GGAGATGCAT TCAACATGCG CCACCCAGCG ATTGCCTCTG QAATGATGGT XOBO 

TGTATTATCT GACATTCTCA TTCTACGCCG CCTTCTCCAG CCATTGCGAA ACCTCAGTGA 1140 
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TGCAAATAAA GTATCAGAAG TTATTAAGTC ATTTTATGTC ATCCGAAAGC CAATGTCAGC 1200 

GACGGTGAAC ACGCTAGGAA ATGCATTTTC TCAAGTGCTA ATTGCATCTA CGGACGAAGC 1260 

AAAAGAAGCG ATGCGACAAG GCTGTTTTGA TTACCTCTCT AGTGGCGGCT TTCGCACGTC 1320 

AGGAATGATG GCTCTGCTCG GTGGCATGAA CCCTCGACCA CTCTCTCTCA TCTTTCATCT 1380 

ATGTGGTATT ACTCTATCCT CCATTGGTCA ACTGCTCTCG CCATTTCCAT CTCCTCTTGG 1440 

CATTTGGCAT AGCCTCAGAC TTTTTGGTGT AAGTCATTAT CTCCCTCCCT ATGTTATTTA 1500 

CATATTTTTC TTTGTGTTAT ATATTTTGTA AATAATTTAC AATTGAATTT TGACATTTTC 1560 

TTGTTGTTTA TGTGTATGCC TAATTGTCTA TGAAAATGTT GGTTCCTCAT CTTAAGGCTG 1620 

AAGGGGTTAG CCAAATGCTG TCTCCAGCAT ACGCAGCCGC GTATCGCAAA AGCTATATGA 1680 

CCGCAACCGC TCTCTAAGCA TCGATGATAA GAACCGCGAA TGATACTATG ACATATTTGG 1740 

AGCGCTAGTA TTTTGTGGTT TTGCATCCGT TAAAAATTTA AAATGTGTTG CTGTGTGTTT 1800 

ACTATTATTA GTGTATTACC TGGAAAATAC CCGTGGGTAT ATTCTAAATG TATAAAATAT 1860 

TGTGATAAAT AAAACGACTC TCCGTTTGGT TGG 1893 
(2) INFORMATION FOR SEQ ID NO: 6: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 572 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus Musculue 

(B) STRAIN: B6CBA 
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(D) DEVELOPMENTAL STAGE: €-8 weeks 
(F) TISSUE TYPE: liver 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Lambda ZAP vector Stratagene catalog ft 93 53 02 

(B) CLONE: pMMSE-17 



(x) PUBLICATION INFORMATION: 
(A) AUTHORS: Kosuga, K. 

Hata, S. 
Osumi , T . 
Sakakibara, J. 
Ono , T . 

<B) TITLE: Nucleotide sequence of a cDNA for 
squalene epoxidase 

(C) JOURNAL: Biochim. Biophys . Acta 

(D) VOLUME: 1260 

(E) ISSUE: 3 

(F) PAGES : 345-348 

(G) DATE: 1995 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Met Trp Thr Phe Leu Gly lie Ala Thr Phe Thr Tyr Phe Tyr Lys Lys 



10 



15 



Cys Gly Asp Val Thr Leu Ala Asn Lys Glu Leu Leu Leu Cys Val Leu 



20 



25 



30 



Val Phe Leu Ser Leu Gly Leu Val Leu Ser Tyr Arg Cys Arg His Arg 



35 



40 



45 



His Gly Gly Leu Leu Gly Arg His Gin Ser Gly Ala Gin Phe Ala Ala 
50 55 



60 



Phe Ser Asp lie Leu Ser Ala Leu Pro Leu lie Gly Phe Phe Trp 



65 



70 



75 



Ala 
80 



Lys Ser Pro Glu Ser Glu Lys Lya Glu Gin Leu Glu Ser Lys Lys Cys 
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85 



90 



95 



Arg Lys Glu lie Gly Leu Ser Glu Thr Thr Leu Thr Gly Ala Ala Thr 
100 105 110 

Ser Val Ser Thr Ser Phe Val Thr Asp Pro Glu Val He He Val Gly 
115 120 125 

Ser Gly Val Leu Gly Ser Ala Leu Ala Ala Val Leu Ser Arg Asp Gly 
130 135 140 

Arg Lys Val Thr Val He Glu Arg Asp Leu Lys Glu Pro Asp Arg He 
145 150 155 160 

Val Gly Glu Leu Leu Gin Pro Gly Gly Tyr Arg Val Leu Gin Glu Leu 
165 170 175 

Gly Leu Gly Asp Thr Val Glu Gly Leu Asn Ala His His He His Gly 
180 185 190 

Tyr He Val His Asp Tyr Glu Ser Arg Ser Glu Val Gin He Pro Tyr 
195 200 205 

Pro Leu Ser Glu Thr Asn Gin Val Gin Ser Gly He Ala Phe His His 
210 215 220 

Gly Arg Phe He Met Ser Leu Arg LyB Ala Ala Met Ala Glu Pro Asn 
225 230 235 240 

Val Lys Phe He Glu Gly Val Val Leu Gin Leu Leu Glu Glu Asp Asp 
245 250 255 

Ala Val He Gly Val Gin Tyr Lys Asp Lys Glu Thr Gly Asp Thr Lys 
260 265 270 



Glu Leu His Ala Pro Leu Thr Val Val Ala Asp Gly Leu Phe Ser Lys 
275 280 285 



Phe Arg Lys Ser Leu He Ser Ser Lys Val Ser Val Ser Ser His Phe 
290 295 300 
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Val Gly Phe Leu Met Lys Asp Ala Pro Gin Phe Lys Pro Asn Phe Ala 



305 



310 



315 



320 



Glu Leu Val Leu Val Asn Pro Ser Pro Val Leu lie Tyr Gin lie Ser 



325 



330 



335 



Ser Ser Glu Thr Arg Val Leu Val Asp He Arg Gly Glu Leu Pro Arg 
340 345 



350 



Asn Leu Arg Glu Tyr Met Ala Glu Gin lie Tyr Pro Gin Leu Pro Glu 
355 360 



365 



His Leu Lys Glu Ser Phe Leu Glu Ala Ser Gin Asn Gly Arg Leu Arg 



370 



375 



380 



Thr Met Pro Ala Ser Phe Leu Pro Pro Ser Ser Val Asn Lys Arg Gly 
385 390 



395 



400 



Val Leu He Leu Gly Asp Ala Tyr Asn Leu Arg His Pro Leu Thr Gly 



405 



410 



415 



Gly Gly Met Thr Val Ala Leu Lys Asp lie Lys Leu Trp Arg Gin Leu 



420 



425 



430 



Leu Lys Asp lie Pro Asp Leu Tyr Asp Asp Ala Ala lie Phe Gin Ala 



435 



440 



445 



Lys Lys Ser Phe Phe Trp Ser Arg Lys Arg Thr His Ser Phe Val Val 
450 455 



450 



Asn Val Leu Ala Gin Ala Leu Tyr Glu Leu Phe Ser Ala Thr Asp Asp 
470 475 

Ser Leu His Gin Leu Arg Lys Ala Cys Phe Leu Tyr Phe Lys Leu Gly 



485 



490 



495 



Gly Glu Cys Val Thr Gly Pro Val Gly Leu Leu Ser lie Leu Ser Pro 



500 



505 



510 
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His Pro Leu Val Leu lie Arg His Phe Phe Ser Val Ala He Tyr Ala 
515 520 525 

Thr Tyr Phe Cys Phe Lys Ser Glu Pro Trp Ala Thr Lys Pro Arg Ala 
530 535 540 

Leu Phe Ser Ser Gly Ala Val Leu Tyr Lye Ala Cys Ser He Leu Phe 
545 550 555 560 

Pro Leu He Tyr Ser Glu Met Lys Tyr Leu Val His 
565 570 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 amino acids 

(B) TYPE: amino acid 
<C) STRANDEDNESS : 

<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(iv) ANT I -SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Rattus norvegicus 
(F) TISSUE TYPE: kidney 
(H) CELL LINE: NRK 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: pcD2 library of H. Okayama 
<B) CLONE : Tb-1 



(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Sakakibara, J. 

Watanabe , R . 
Kanai , R • 
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Ono, T. 

(B) TITLE: Molecular cloning and expression of rat 

sqalene epoxidase 

(C) JOURNAL: J. Biol. Chem. 

(D) VOLUME: 270 

(E) ISSUE: 1 

(F) PAGES: 17-20 

(G) DATE: 1995 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



Met Trp Thr Phe Leu Gly lie Ala Thr Phe Thr Tyr Phe Tyr Lys Lys 
15 io 



15 



Cys Gly Asp Val Thr Leu Ala Asn Lys Glu Leu Leu Leu Cys Val Leu 
20 25 30 

Val Phe Leu Ser Leu Gly Leu Val Leu Ser Tyr Arg Cys Arg His Arg 
35 40 45 

Asn Gly Gly Leu Leu Gly Arg His Gin Ser Gly Ser Gin Phe Ala Ala 
50 55 60 

Phe Ser Asp lie Leu Ser Ala Leu Pro Leu He Gly Phe Phe Trp Ala 
65 70 75 eo 

Lys Ser Pro Pro Glu Ser Glu Lys Lys Glu Gin Leu Glu Ser Lys Arg 
85 90 95 

Arg Arg Lys Glu Val Asn Leu Ser Glu Thr Thr Leu Thr Gly Ala Ala 
100 105 no 

Thr Ser Val Ser Thr Ser Ser Val Thr Asp Pro Glu Val He He He 
115 120 125 

Gly Ser Gly Val Leu Gly Ser Ala Leu Ala Thr Val Leu Ser Arg Asp 
130 135 140 



Gly Arg Thr Val Thr Val He Glu Arg Asp Leu Lys Glu Pro Asp Arg 
i45 150 155 



160 
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Ile Leu Gly Glu Cys Leu Gin Pro Gly Gly Tyr Arg Val Leu Arg Glu 
165 170 175 

Leu Gly Leu Gly Asp Thr Val Glu Ser Leu Asn Ala His His He His 
180 185 190 

Gly Tyr Val He His Asp Cys Glu Ser Arg Ser Glu Val Gin He Pro 
195 200 205 

Tyr Pro Val Ser Glu Asn Asn Gin Val Gin Ser Gly Val Ala Phe His 
210 215 220 

His Gly Lys Phe He Met Ser Leu Arg Lys Ala Ala Met Ala Glu Pro 
225 230 235 240 

Asn Val Lys Phe He Glu Gly Val Val Leu Arg Leu Leu Glu Glu Asp 
245 250 255 

Asp Ala Val He Gly Val Gin Tyr Lys Asp Lys Glu Thr Gly Asp Thr 
260 265 270 

Lys Glu Leu His Ala Pro Leu Thr Val Val Ala Asp Gly Leu Phe Ser 
275 280 285 

Lys Phe Arg Lys Asn Leu He Ser Asn Lys Val Ser Val Ser Ser His 
290 295 300 

Phe Val Gly Phe He Met Lys Asp Ala Pro Gin Phe Lys Ala Asn Phe 
305 310 315 320 

Ala Glu Leu Val Leu Val Asp Pro Ser Pro Val Leu He Tyr Gin He 
325 330 335 

Ser Pro Ser Glu Thr Arg Val Leu Val Asp He Arg Gly Glu Leu Pro 
340 345 350 



Arg Asn Leu Arg Glu Tyr Met Thr Glu Gin He Tyr Pro Gin He Pro 
355 360 365 
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Asp His Leu Lys Glu Ser Phe Leu Glu Ala Cys Gin Asn Ala Arg Leu 
370 375 3B0 



Arg Thr Met Pro Ala Ser Phe Leu Pro Pro Ser Ser Val Asn Lys Arg 
385 390 395 



400 



Gly Val Leu Leu Leu Gly Asp Ala Tyr Asn Leu Arg His Pro Leu Thr 
405 410 415 



Gly Gly Gly Met Thr Val Ala Leu Lys Asp lie Lys lie Trp Arg Gin 
420 425 



430 



Leu Leu Lys Asp lie Pro Asp Leu Tyr Asp Asp Ala Ala He Phe Gin 



435 



440 



445 



Ala Lys Lys Ser Phe Phe Trp Ser Arg Lys Arg Ser His Ser Phe Val 
450 455 460 



Val Asn Val Leu Ala Gin Ala Leu Tyr Glu Leu Phe Ser Ala Thr Asp 



465 



470 



475 



480 



Asp Ser Leu Arg Gin Leu Arg Lys Ala Cys Phe Leu Tyr Phe Lys Leu 
485 4 9 o 



495 



Gly Gly Glu Cys Leu Thr Gly Pro Val Gly Leu Leu Ser II 



500 



505 



e Leu Ser 
510 



Pro Asp Pro Leu Leu Leu lie Arg His Phe Phe Ser Val Ala Val Tyr 



515 



520 



525 



Ala Thr Tyr Phe Cys Phe Lys Ser Glu Pro Trp Ala Thr Lys Pro Arg 



530 



535 



540 



Ala Leu Phe Ser Ser Gly Ala lie Leu Tyr Lys Ala Cys Ser lie 
545 550 



555 



He 
560 



Phe Pro Leu He Tyr Ser Glu Met Lys Tyr Leu Val 



His 



565 



570 



(2) INFORMATION FOR SEQ ID NO: 8: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Saccharomyces cerevisiae 

(B) STRAIN: A2-M8 



(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Jandrositz, A. 

Hoegenauer, G. 
Turnowsky, F. 

<B) TITLE: The gene encoding squalene epoxidase from 
Saccharomyces cerevisiae: cloning and 
characterization 

(C) JOURNAL: Gene 

(D) VOLUME: 107 

(F) PAGES: 155-160 

(G) DATE: 1991 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Ser Ala Val Asn Val Ala Pro Glu Leu He Asn Ala Asp Asn Thr 
15 10 15 

He Thr Tyr Asp Ala He Val He Gly Ala Gly Val He Gly Pro Cys 
20 25 30 

Val Ala Thr Gly Leu Ala Arg Lys Gly Lys Lys Val Leu He Val Glu 
35 40 45 
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Arg Asp Trp Ala Met Pro Asp Arg lie Val Gly Glu Leu Met Gin Pro 
50 55 go 

Gly Gly Val Arg Ala Leu Arg Ser Leu Gly Met lie Gin Ser lie Asn 
65 ™ 75 eo 

Asn lie Glu Ala Tyr Pro Val Thr Gly Tyr Thr Val Phe Phe Asn Gly 
85 90 95 

Glu Gin Val Asp He Pro Tyr Pro Tyr Lys Ala Asp He Pro Lys Val 
100 105 no 

Glu Lys Leu Lys Asp Leu Val Lys Asp Gly Asn Asp Lys Val Leu Glu 
115 120 125 

Asp Ser Thr He His He Lys Asp Tyr Glu Asp Asp Glu Arg Glu Arg 
130 135 14Q 

Gly Val Ala Phe Val His Gly Arg Phe Leu Asn Asn Leu Arg Asn He 
145 150 155 i 6 o 

Thr Ala Gin Glu Pro Asn Val Thr Arg Val Gin Gly Asn Cys He Glu 
165 170 175 

He Leu Lys Asp Glu Lys Asn Glu Val Val Gly Ala Lys Val Asp He 
180 185 190 

Asp Gly Arg Gly Lys Val Glu Phe Lys Ala His Leu Thr Phe He Cys 
195 200 205 

Asp Gly He Phe Ser Arg Phe Arg Lys Glu Leu His Pro Asp His Val 
210 215 220 

Pro Thr Val Gly Ser Ser Phe Val Gly Met Ser Leu Phe Asn Ala Lys 
225 230 235 240 

Asn Pro Ala Pro Met His Gly His Val lie Phe Gly Ser Asp His Met 
245 250 255 
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Pro He Leu Val Tyr Gin He Ser Pro Glu Glu Thr Arg He Leu Cys 
260 265 270 

Ala Tyr Asn Ser Pro Lys Val Pro Ala Asp He Lya Ser Tip Met He 
275 280 285 

Lys Asp Val Gin Pro Phe He Pro Lys Ser Leu Arg Pro Ser Phe Asp 
290 295 300 

Glu Ala Val Ser Gin Gly Lys Phe Arg Ala Met Pro Asn Ser Tyr Leu 
305 310 315 320 

Pro Ala Arg Gin Asn Asp Val Thr Gly Met Cys Val He Gly Asp Ala 
325 330 335 

Leu Asn Met Arg His Pro Leu Thr Gly Gly Gly Met Thr Val Gly Leu 
340 345 350 

His Asp Val Val Leu Leu He Lys Lys He Gly Asp Leu Asp Phe Ser 
355 360 365 

Asp Arg Glu Lys Val Leu Asp Glu Leu Leu Asp Tyr His Phe Glu Arg 
370 375 380 

Lys Ser Tyr Asp Ser Val He Asn Val Leu Ser Val Ala Leu Tyr Ser 
385 390 395 400 

Leu Phe Ala Ala Asp Ser Asp Asn Leu Lys Ala Leu Gin Lys Gly Cys 
405 410 415 

Phe Lys Tyr Phe Gin Arg Gly Gly Asp Cys Val Asn Lys Pro Val Glu 
420 425 430 

Phe Leu Ser Gly Val Leu Pro Lys Pro Leu Gin Leu Thr Arg Val Phe 
435 440 445 



Phe Ala Val Ala Phe Tyr Thr He Tyr Leu Asn Met Glu Glu Arg Gly 
450 455 460 



Phe Leu Gly Leu Pro Met Ala Leu Leu Glu Gly He Met He Leu He 
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465 



470 



475 



480 



Thr Ala He Arg Val Phe Thr Pro Phe Leu Phe Gly Glu Leu He Gly 
485 490 495 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) - LENGTH: 536 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA to mRNA 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis thaliana 

(B) STRAIN: Columbia 

(D) DEVELOPMENTAL STAGE: 4 different stages and ti 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Lambda- PRL2 

(B) CLONE: 250F2T7 



(x) PUBLICATION INFORMATION: 

(A) AUTHORS: Newman, T. 

deBruijn, F. J, 
Green, P. 
Keegstra, K. 
Kende, H. 
Mcintosh, L. 
Ohlrogge, j. 
Raikhel, N. 
Somerville, S . 
Thomas how, M. 

(B) TITLE: Genes galore: a summary of methods for 
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accessing results from large-scale partial 
sequencing of anonymous Arabidopsis cDNA clones 

(C) JOURNAL: Plant Physiol. 

(D) VOLUME: 106 

(F) PAGES: 1241-1255 

(G) DATE: 1994 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GAGAACATAT AAAAGCCATG CCAACAAAGA AGATGACAGC TACTTTGAGC GAGAAGAAAG 60 

GAGTGATTTT ATTGGGAGAT GCATTCAACA TGCGTCATCC AGCAATCGCA TCTGGAATGA 120 

TGGTTTTATT ATCTGACATT CTCATTCTAC GCCGTCTTCT CCAGCCATTA AGCAACCTTG 180 

GCAATGCGCA AAAAATCTCA CAAGTTATCA AGTCCTTTTA TGATATCCGC AAGCCAATGT 240 

CAGCGACAGT TAACACGTTA GGAAATGCAT TCTCTCAAGT GCTAGTTGCA TCGACGGACG 300 

AAGCAAAAGA GGCAATGAGA CAAGGTTGCT ATGATTACCT CTCTAGTGGT GGGTTTCGCA 360 

CGTCAGGGAT GATGGCTTTG CTAGGCGGAT GAACCCTCGT CCGATCTCTC NCATCNANCA 4 20 

NCNAGGGGAA CACNCANCCC CATNGGCATC AACNCCNCAT TCCCNNCCCT TCGATTGGAA 4 80 

CCTCGACTTT TGGTGGNNNA AAGGTGGCCC CCCANGGGAA GGTTCCATNT NTCCNC 536 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 base pairs 
<B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(iii) HYPOTHETICAL: NO 



(iv) ANTI- SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Ricinus Communis 

(B) STRAIN: Baker 296 

(D> DEVELOPMENTAL STAGE: immature castor fruits 
(F) TISSUE TYPE: endosperm and embryo 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: lambdaZAPST 

(B) CLONE: pcrs547 



(x) PUBLICATION INFORMATION: 

(A) AUTHORS: van de Loo, F. J. 

Turner, S. 
Somerville, C. 

(B) TITLE: Expressed sequence tags from developing 

castor seeds 

(C) JOURNAL: Plant Physiol. 

(D) VOLUME: 108 

(F) PAGES: 1141-1150 

(G) DATE: 1995 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTTGAGCTCA GAGTCACAGA TATAGACATC CTAGGGAAAA CATTCTCCTA TAAACTAAAG 
CGTATTACAA TTCACACTTC TTTTCCCCTC AACTTTGATT TGAACAAAGG GATGAGATTA 
AAACCAAAAT GAGAAACGCC CCGTTCCTTC TTGTCACGAA TTTTTCACTC ACATTCTTGT 
CAAACTAATT GCATTCAACA GGAGGAGCTC TATAATATGC TGGGACGGTT GCGGGGAAGA 
ACATCTGTCT AACTCCTTCT GCCTTGATAA TGGGGAAGAT GATTCCTGAT GCACCCGATA 
TCAACCTAGC TCCAACCCAG ACGCGCTTAG GTGAAGGGAA TGGCAGTAAC AAAGGGGGGG 
CCCGGTACCC AATTTGCCCT ATAGTGAGCC GTATTCAATN ACTGGCCGTT GTTTCAACGT 
GTGCCTTGGG AAACCCTGGG GTNC CACTTA TTGCTTCAGA CATCCCCTTT GCANTTGGTA 



60 



120 



180 



240 



300 



360 



420 



480 
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TTNGAGGGGC CGACCGTTGC CTCCAANAGT NCNCOTTNAA TTGGGTTGAA ANTTNCGGGA 540 



(2) INFORMATION FOR SEQ ID NO: 11: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 503 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asp Leu Ala Phe Pro His Val CyB Leu Trp Thr Leu Leu Ala Phe 
15 10 15 

Val Leu Thr Trp Thr Val Phe Tyr Val Asn Asn Arg Arg Lys Lys Val 
20 25 30 

Ala Lye Leu Pro Asp Ala Ala Thr Glu Val Arg Arg Asp Gly Asp Ala 

35 40 45 

Asp Val He He Val Gly Ala Gly Val Gly Gly Ser Ala Leu Ala Tyr 
50 55 60 

Ala Leu Ala Lys Asp Gly Arg Arg Val His Val He Glu Arg Asp Met 
65 70 75 80 

Arg Glu Pro Val Arg Met Met Gly Glu Phe Met Gin Pro Gly Gly Arg 
85 90 95 



Leu Leu Leu Ser Lys Leu Gly Leu Glu Asp Cys Leu Glu Gly He Asp 
100 105 HO 
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Glu GXn He Ala Thr Gly Leu Ala Val Tyr Lys Asp Gly Gin Lys Ala 
115 120 125 

Leu Val Ser Phe Pro Glu Asp Asn Asp Phe Pro Tyr Glu Pro Thr Gly 
130 135 140 

Arg Ala Phe Tyr Asn Gly Arg Phe Val Gin Arg Leu Arg Gin Lys Ala 
145 iso 



155 



160 



Ser Ser Leu Pro Thr Val Gin Leu Glu Glu Gly Thr Val Lys Ser Leu 
165 170 175 

lie Glu Glu Lys Gly Val He Lys Gly Val Thr Tyr Lys Asn Ser Ala 
180 185 190 

Gly Glu Glu Thr Thr Ala Phe Ala Pro Leu Thr Val Val Cys Asp Gly 
195 200 205 

Cys Tyr Ser Asn Leu Arg Arg Ser Val Asn Asp Asn Asn Ala Glu Val 
210 215 220 

lie Ser Tyr Gin Val Gly Tyr Val Ser Lys Asn Cys Gin Leu Glu Asp 
225 2 3o 



235 



240 



Pro Glu Lys Leu Lys Leu He Met Ser Lys Pro Ser Phe Thr Met Leu 
245 250 255 

Tyr Gin lie Ser Ser Thr Asp Val Arg Cys Val Met Glu He Phe Pro 
260 2 «5 270 

Gly Asn lie Pro Ser He Ser Asn Gly Glu Met Ala Val Tyr Leu Lys 
275 2a0 285 

Asn Thr Met Ala Pro Gin Val Pro Pro Glu Leu Arg Lys lie Phe Leu 
290 2 *S 300 

Lys Gly He Asp Glu Gly Ala Gin lie Lys Ala Met Pro Thr Lys Arg 

305 310 lie 

315 320 

Met Glu Ala Thr Leu Ser Glu Lys Gin Gly Val He Val Leu Gly Asp 
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325 330 335 

Ala Phe Asn Met Arg His Pro Ala He Ala Ser Gly Met Met Val Val 
340 345 350 

Leu Ser Asp lie Leu He Leu Arg Arg Leu Leu Gin Pro Leu Arg Asn 
355 360 365 

Leu Ser Asp Ala Asn Lys Val Ser Glu Val He Lys Ser Phe Tyr Val 
370 375 380 

He Arg Lys Pro Met Ser Ala Thr Val Asn Thr Leu Gly Asn Ala Phe 
385 390 395 400 

Ser Gin Val Leu He Ala Ser Thr Asp Glu Ala Lys Glu Ala Met Arg 
405 410 415 

Gin Gly Cys Phe Asp Tyr Leu Ser Ser Gly Gly Phe Arg Thr Ser Gly 
420 425 430 

Met Met Ala Leu Leu Gly Gly Met Asn Pro Arg Pro Leu Ser Leu He 
435 440 445 

Phe His Leu Cys Gly He Thr Leu Ser Ser He Gly Gin Leu Leu Ser 
450 455 460 

Pro Phe Pro Ser Pro Leu Gly He Trp His Ser Leu Arg Leu Phe Gly 
465 470 475 480 

Val Ser Gin Met Leu Ser Pro Ala Tyr Ala Ala Ala Tyr Arg Lys Ser 
485 490 495 

Tyr Met Thr Ala Thr Ala Leu 
500 
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CLAIMS i 

1. An isolated and cloned DNA suitable for introduction 
into a genome of a plant to suppress expression of 
squalene epoxidase by said plant below natural levels, 
characterised in that said DNA has a sequence 
corresponding at least in part to a squalene epoxidase 
gene of a plant. 

2. DNA according to claim 1, characterised by a 
sequence corresponding to all or part of a specific 
sequence selected from SEQ ID N0:1, SEQ ID NO: 3, SEQ ID 
NO: 5, SEQ ID NO: 9 and SEQ ID NO: 10; or having at least 
60% homology thereto. 

3. DNA according to claim 2, characterised in that said 
part of said sequence comprises at least 20 consecutive 
nucleotides of said specific sequence. 

4. DNA according to claim 2, characterised in that said 
part of said sequence comprises at least 100 consecutive 
nucleotides of said specific sequence. 

5. A process of producing genetically-modified plants 
having increased levels of squalene in tissues of the 
plants compared to corresponding wild- type plants, 
wherein the plant genome is modified to suppress 
expression of squalene expoxidase by said plant, 
characterised in that said genome is modified by 
introducing at least one exogenous DNA sequence that 
corresponds, at least in part, to one or more endogenous 
squalene epoxidase genes of said plant. 

6. A process according to claim 5, characterised in 
that said DNA sequence introduced into said plant genome 
has at least 60% homology to said one or more of said 
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endogenous squalene epoxidase genes. 

7. A process according to claim 5, characterised in 
that said exogenous DNA has a sequence corresponding to 
all or part of a specific sequence selected from SEQ ID 
NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO : 9 and SEQ ID 
NO: 10; or has at least €0% homology thereto. 

8. A process according to claim 7, characterised in 
that said part of said sequence comprises at least 20 
consecutive nucleotides of said specific sequence. 

9. A process according to claim 7, characterised in 
that said part of said sequence comprises at least 100 
consecutive nucleotides of said specific sequence. 

10. A process as claimed in claim 5, characterised in 
that said at least one DNA sequence introduced into said 
genome is arranged in a sense orientation relative to a 
transcriptional promoter such that it is capable of 
decreasing said expression by co-suppression or homology- 
dependent gene silencing. 

11. A process as claimed in claim 5, characterised in 
that said at least one DNA sequence introduced into said 
genome forms part of a gene encoding a ribozyme that is 
capable of catalysing endonucleolytic cleavage of said 
one or more of the endogenous squalene epoxidase genes of 
said plant. 

12. A process as claimed in claim 5, characterised in 
that said exogenous DNA is obtained by identifying at 
least one squalene epoxidase gene of said plant, and 
sequencing and cloning the gene or at least a part 
thereof . 
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13. A process according to claim 5, characterised in 
that said exogenous DNA sequence is introduced into said 
plant by a procedure selected from Abrobacteri urn-mediated 
and particle gun transformation techniques. 

14. A process of producing genetically-modified plants 
having increased levels of squalene in tissues of the 
plants compared to corresponding wild- type plants, 
wherein the plant genome is modified to suppress 
expression of squalene expoxidase by said plant by 
introducing a nucleotide sequence that reduces or 
prevents expression of squalene epoxidase into a genome 
of said plant, characterised in that said DNA includes a 
transcriptional promoter and a sequence arranged such 
that when transcribed from the promoter, resulting RNA is 
complementary or antisense to all or part of at least one 
squalene epoxidase messenger RNA transcribed from a 
squalene epoxidase gene of said plant. 

15. A process according to claim 14, characterised in 
that said nucleotide sequence comprises all or part of a 
sequence selected from the group consisting of SEQ ID 
NO:l, SEQ ID N0:3, SEQ ID N0:5, SEQ ID NO:9 and SEQ ID 
NO: 10; or is a sequence having at least 60% homology 
thereto. 



16. Plasmid pDR411 (ATCC 97845). 

17. Plasmid pDRlii (ATCC 97846). 

18. Plasmid pl29F12T7 {ATCC 97847) . 



19 . A vector for introducing a nucleotide sequence into 
a plant genome, characterised in that said vector 
comprises a construct containing a nucleotide sequence 
that is antisense to a plant squalene epoxidase gene or a 
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part thereof, positioned between a transcriptional 
promoter segment and a transcriptional termination 
segment . 

20. A vector according to claim 19 , characterised in 
that said nucleotide sequence comprises all or part of a 
specific sequence selected from SEQ ID NO:l, SEQ ID NO: 3, 
SEQ ID NO:5 # SEQ ID NO: 9 and SEQ ID NO: 10; or has at 
least 60% homology thereto. 

21. Vector pSE129A (ATCC 97910). 

22. Vector pSE411A (ATCC 97908). 

23. Vector pSElllA (ATCC 97909) . 

24. A genetically-modified plant capable of accumulating 
squalene at levels higher than the corresponding wild- 
type plant, characterised in that said genetically- 
modified plant has been produced by a process according 
to claim 5, claim 6, claim 7, claim 8, claim 9, claim 10, 
claim 11, claim 12, claim 13, claim 14 or claim 15. 

25. A seed of a genetically-modified oilseed plant 
containing squalene at levels higher than seeds of 
equivalent wild- type plants, characterised in that said 
genetically-modified plant has been produced by a process 
according to claim 5, claim 6, claim 7, claim 8, claim 9, 
claim 10, claim 11, claim 12, claim 13, claim 14 or claim 

15. * 

26. A process of producing squalene, characterised by 
growing a genetically-modified plant as defined in claim 
24, harvesting said plant or seeds of said plant, and 
extracting squalene from said harvested plant or seeds. 
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